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SOME FUNDAMENTAL THEOREMS IN MATHEMATICS 


OLIVER KNILL 


ABSTRACT. An expository hitchhikers guide to some theorems in mathematics. 


Criteria for the current list of 250 theorems are whether the result can be formulated elegantly, 
whether it is beautiful or useful and whether it could serve as a guide [6] without leading 
to panic. The order is not a ranking but ordered along a time-line when things were writ- 
ten down. Since stated “a mathematical theorem only becomes beautiful if presented 
as a crown jewel within a context" we try sometimes to give some context. Of course, any 
such list of theorems is a matter of personal preferences, taste and limitations. The num- 
ber of theorems is arbitrary, the initial obvious goal was 42 but that number got eventually 
surpassed as it is hard to stop, once started. As a compensation, there are 42 “tweetable" 
theorems with included proofs. More comments on the choice of the theorems is included in 
an epilogue. For literature on general mathematics, see [143], 
for history (76) [349], for popular, 
beautiful or elegant things 
(206) (2) (130) (157) (3T) 15) 268) [77]. For comprehensive overviews in large parts of math- 
ematics, [77| (170) [71] (53) [607] or predictions on developments [49]. For reflections about 
mathematics in general [150] (466) |47| (813) [449] [102] [575]. Encyclopedic source examples are 
[649]. 


This is a live document which is in the process of being extended. Thanks so far to Johan 
Commelin, Mikhail Katz, David McCarthy, Kapil Paranjape, Jordan Stoyanov, Michael Somos, 
Ross Rosenwald for some valuable comments or corrections. 


1. ARITHMETIC 


Let N = {0,1,2,3,...} be the set of natural numbers. A number p € N,p > 1 is prime 
if p has no factors different from 1 and p. With a prime factorization n = p,...pn, we 
understand the prime factors p; of n to be ordered as p; < pi41. The fundamental theorem 
of arithmetic is 


Theorem: Every n € N,n > 1 has a unique prime factorization. 
Euclid anticipated the result. Carl Friedrich Gauss gave in 1798 the first proof in his monograph 
“Disquisitiones Arithmeticae". Within abstract algebra, the result is the statement that the 


ring of integers Z is a unique factorization domain. For a literature source, see [363]. For 
more general number theory literature, see [333} [119]. 
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2. GEOMETRY 


Given an inner product space (V,-) with dot product v- w leading to length |v| = /v.v, 
three non-zero vectors v,w,v—w define a right angle triangle if v and w are perpendicular 
meaning that v-w =0. If a = |v|,b = |w|,c = |v —w| are the lengths of the three vectors, then 
the Pythagoras theorem is 


Theorem: a?+ 0? =c’. 


Anticipated by Babylonians mathematicians in examples, it appeared independently also in 
Chinese mathematics [644] and might have been proven first by Pythagoras but already 
early source express uncertainty (see e.g. [361| p. 32). The theorem is used in many parts of 
mathematics like in the Perseval equality of Fourier theory or that for uncorrelated random 
variables the variance is additive Var[X]+ Var[Y] = Var|X +Y]. In linear algebra it generalizes 
to the Lagrange identity det(F7F) = wien det?(Fp) which holds for all n x m matrices, 
where the sum to the right is over all m x m sub-matrices P of F' [32], a formula which in 


calculus becomes |#|?|7|? — (v- w)? = |v A WI?. See [866] 548) 461] 374]. 


3. CALCULUS 


Let f be a function of one variables which is continuously differentiable, meaning that 
the limit g(a) = limp+o[f(@ +h) — f(x)|/h exists at every point x and defines a continuous 


function g. For any such function f, we can form the integral ft f(t) dt and the derivative 


d/dx f(x) = f'(z). 
Theorem: f f'(x)dx = f(b)— f(a), 4 f° f(ddt = f(z) 


Newton and Leibniz discovered the result independently, Gregory wrote down the first proof 
in his “Geometriae Pars Universalis" of 1668. The result generalizes to higher dimensions in 
the form of the Green-Stokes-Gauss-Ostogradski theorem [,,dF = J,,,F which holds 
for n-forms F' with exterior derivative dF and compact (n+ 1)-manifolds M with boundary 
OM. [202] tells the “tongue in the cheek" proof: as the derivative is a limit of quotient 
of differences, the anti-derivative must be a limit of sums of products. For history, see 


[373] [199]. 


4. ALGEBRA 


A polynomial is a complex-valued function of the form f(a) = a9 + az + +--+ a,x", where 
the entries a, are in the complex plane C. The space of all polynomials is denoted by C|z]. The 
largest non-negative integer n for which a, 4 0 is called the degree of the polynomial. Degree 
1 polynomials are linear, degree 2 polynomials are called quadratic etc. The fundamental 
theorem of algebra is 


Theorem: Every f € C[z] of degree n can be factored into n linear factors. 


This result was anticipated during the 17th century. The first author to assert that any n’th 
degree polynomial has a root is Peter Roth in 1600 [544]. This was proven first by Carl Friedrich 
Gauss and finalized in 1920 by Alexander Ostrowski who fixed a topological mistake in Gauss 
proof. The theorem assures that the field of complex numbers C is algebraically closed. For 
history and many proofs see [222]. 
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5. PROBABILITY 


Given a sequence X; of independent random variables on a probability space (Q, A, P) 
which all have the same cumulative distribution functions F(t) = P|X < t]. The nor- 
malized random variable X = is (X —E[X])/o[X], where E[X] is the mean [, X(w)dP(w) 
and o[X] = E[(X — E[X])?]+ is the standard deviation. A sequence of random variables 
Zn — Z converges in distribution to Z if Fz,(t) > Fz(t) for allt asn > o. If Zisa 
Gaussian random variable with zero mean E[Z] = 0 and standard deviation o[Z] = 1, the 
central limit theorem is: 


Theorem: (X,+ X2.+---+X,) > Z in distribution. 


Proven in a special case by Abraham De-Moivre for discrete random variables and then by 
Constantin Carathéodory and Paul Lévy, the theorem explains the importance and ubiquity 
of the Gaussian density function e 27/2 /\/2nx defining the normal distribution. The 
Gaussian distribution was first considered by Abraham de Moivre in 1738. See [396]. 


6. DYNAMICS 


Assume X is a random variable on a probability space (2,.A,P) for which |X| has finite 
mean E[|X|]. This means X : Q — R is measurable and f, |X(«)|dP(z) is finite. Let T be an 
ergodic, measure-preserving transformation from 2 to Q. Measure preserving means that 
P[T~'(A)| = P[A] for all measurable sets A € A. Ergodic means that that T(A) = A 
implies P[A] = 0 or P[A] = 1 for all A € A. The ergodic theorem states, that for an ergodic 
transformation T' on has: 


Theorem: [X(x)+X(Txr) +---+ X(T" 1(x))|/n — E[X] for almost all z. 


This theorem from 1931 is due to George Birkhoff and is called Birkhoff’s pointwise ergodic 
theorem. It assures that “time averages" are equal to “space averages". A draft of the von 
Neumann mean ergodic theorem which appeared in 1932 by John von Neumann has 
motivated Birkhoff, but the mean ergodic version is weaker. See for history. A special 
case is the law of large numbers, in which case the random variables x + X(T*(x)) are 
independent with equal distribution (IID). The theorem belongs to ergodic theory [292] [148] 


[608]. 


7. SET THEORY 


A bijection is a map from a set X to a set Y which is injective: f(x) = f(y) > x = y and 
surjective: for every y € Y, there exists x € X with f(x) = y. Two sets X,Y have the same 
cardinality, if there exists a bijection from X to Y. Given a set X, the power set 2* is the 
set of all subsets of X, including the empty set and X itself. If X has n elements, the power 
set has 2” elements. Cantor’s theorem is 


Theorem: For any set X, the sets X and 2* have different cardinality. 


The result is due to Cantor. Taking for X the natural numbers, then every Y € 2* defines a 
real number $(Y) = icy 2-4 € [0, 1]. As Y and [0,1] have the same cardinality (as double 
counting pair cases like 0.39999999 - -- = 0.400000... form a countable set), the interval [0, 1] 
is uncountable. There are different types of infinities leading to countable infinite sets and 
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uncountable infinite sets. In order to compare sets, the Schro6der-Bernstein theorem is 
important. If there exist injective functions f : X + Y and g: Y > X, then there exists also 
a bijection X — Y. This result was used by Cantor already. For literature, see [293]. 


8. STATISTICS 


A probability space (2,.4,P) consists of a set 0, a o-algebra A and a probability mea- 
sure P. A o-algebra is a collection of subset of 2 which contains the empty set and which 
is closed under the operations of taking complements, countable unions and countable 
intersections. The function P on A takes values in the interval [0,1], satisfies P[Q] = 1 and 
PlUyes A] = Sous PIA] for any finite or countable set S C A of pairwise disjoint sets. The 
elements in A are called events. Given two events A,B where B satisfies P|B] > 0, one can 
define the conditional probability P[A|B] = P|AN B]/P[B]. Bayes theorem states: 


Theorem: P/A|B] = P[B|A]P[A]/P[B] 


The setup stated the Kolmogorov axioms by Andrey Kolmogorov who wrote in 1933 the 
“Grundbegriffe der Wahrscheinlichkeitsrechnung" [414] based on measure theory built by Emile 
Borel and Henry Lebesgue. For history, see [596], who report that “Kolmogorov sat down 
to write the Grundbegriffe, in a rented cottage on the Klyaz’ma River in November 1932". 
Bayes theorem is rather a fantastically clever definition and not really a theorem. There is 
almost nothing to prove as multiplying with P|B] gives P|[ANM B] on both sides. It essentially 
restates that AM B = BOA, the Abelian property of the product in the ring A. More 
general is the statement that if A,,...,A, is a disjoint set of events whose union is 2, then 
P[A;|B] = P[B|A;]P[Ai]/(0, P[BJA;|P[Aj]. Bayes theorem was first noticed in 1763 by Thomas 
Bayes. It is by some considered to the theory of probability what the Pythagoras theorem is to 
geometry. “Monty Hall" type stories [569] illustrate that conditional expectation is not always 
intuitive. 


9. GRAPH THEORY 


A finite simple graph G = (V, £) is a finite collection V of vertices connected by a finite 
collection EF of edges, which are un-ordered pairs (a,b) with a,b € V. Simple means that no 
self-loops nor multiple connections are present in the graph. The vertex degree d(x) of 
x € V is the number of edges containing 2. 


Theorem: 5 ,-y d(x)/2 = |E|. 


Zev 
This formula is also called the Euler handshake formula because every edge in a graph 
contributes exactly two handshakes. It can be seen as a Gauss-Bonnet formula for the 
valuation G — v)(G) counting the number of edges in G. A valuation ¢ is a function 
defined on sub-graphs with the property that ¢(AU B) = ¢(A) + ¢(B)—¢(AN B). Examples 
of valuations are the number v;(G) of complete sub-graphs of dimension k of G. An other 
example is the Euler characteristic y(G) = vp(G) —v1(G) +. v2(G) — v3(G) +: +» + (—1)4v4(G). 
If we write d,(x) = vg(S(x)), where S(x) is the unit sphere of x, then ))-y dg(x)/(k +1) = 
v,(G) is the generalized handshake formula, the Gauss-Bonnet result for v,. The Euler 
characteristic then satisfies >, K(x) = x(G), where K(x) = 7.9(—1)*ux(S(a))/(k + 1). 
This is the discrete Gauss-Bonnet result. The handshake result is the special case for the 
valuation v;(G) counting the number of edges of a graph G. It was found by Euler and has 
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by some called the fundamental theorem of graph theory. For more about graph theory, 


70} about Euler: [220]. 


10. POLYHEDRA 


A finite simple graph G = (V, E) is given by a finite vertex set V and edge set E. A subset 
W of V generates the sub-graph (W, {{a,b} € E | a,b © W}). The unit sphere of v ¢ V 
is the sub graph generated by S(x) = {y € V | {z,v} € E}. The empty graph 0 = (0,9) 
is called the (—1)-sphere. The 1-point graph 1 = ({1},0) = kK, is the smallest contractible 
graph. Inductively, a graph G is contractible, if it is either 1 or if there exists x € V such that 
both G —z and S$(z) are contractible. Inductively, a graph G' is a d-sphere, if it is either 0 or 
if every S(a) is a (d— 1)-sphere and if there exists a vertex x such that G — x is contractible. 
Let vu, denote the number of complete sub-graphs K,41 of G. The vector (vg, v1,...) is the 
f-vector of G and x(G) = vp —vi + v2—... is the Euler characteristic of G. The generalized 
Euler gem formula due to Schlafli is: 


Theorem: For d = 2, x(G) =v—e+f = 2. For d-spheres, y(G) = 1+ (-1)4. 


Convex Polytopes were studied already in ancient Greece. The Euler characteristic relations 
were discovered in dimension 2 by Descartes and interpreted topologically by Euler who 
proved the case d = 2. This is written as v —e + f = 2, where v = up,e = U1, f = v2. The 
two-dimensional case can be stated for planar graphs, where one has a clear notion of what 
the two dimensional cells are and can use the topology of the ambient sphere in which the graph 
is embedded. Historically there had been confusions about the definitions. It was 
Ludwig Schlafli [588] who covered the higher dimensional case. The above set-up is a modern 
reformulation of his set-up, due essentially to Alexander Evako. Multiple refutations can 
be blamed to ambiguous definitions. Polytopes are often defined through convexity [278] [718] 
and there is not much consensus on a general definition [277], which was the reason in this 
entry to formulate Schlafli’s theorem in a rather restrictive case (where all cells are simplices), 
but where we have a simple combinatorial definition of what a “sphere" is. See also [353]. 


11. TOPOLOGY 


The Zorn lemma assures that the Cartesian product of a non-empty family of non-empty 
sets is non-empty. The Zorn lemma is equivalent to the axiom of choice C in the Zermelo- 
Frenkel ZFC axiom system and also equivalent to the Tychonov theorem in topology. 
Let X = [],-; Xi; denote the product of topological spaces. The product topology is the 
weakest topology on X which renders all projection functions 7; : X — X; continuous. 
Here is Tychonov;s theorem 


Theorem: [If all X; are compact, then [[,-; Xi is compact. 


Zorn’s lemma is due to Kazimierz Kuratowski in 1922 and Max August Zorn in 1935. Andrey 
Nikolayevich Tykhonov proved his theorem in 1930. One application of the Zorn lemma is the 
Hahn-Banach theorem in functional analysis, the existence of spanning trees in infinite 
graphs or the fact that commutative rings with units have maximal ideals. For literature, see 
3o]]. 
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12. ALGEBRAIC GEOMETRY 


The algebraic set V(J) of an ideal J in the commutative ring R = k[x1,...,2,] over an 
algebraically closed field k defines the ideal J(V(J)) containing all polynomials that vanish 
on V(J). The radical VJ of an ideal J is the set of polynomials in R such that r” € J for 
some positive n. [An ideal J in a ring R is a subgroup of the additive group of R such that 
rx € I for allr € R and all x € J. It defines the quotient ring R/J and is so the kernel of 
a ring homomorphism from R to R/I. The algebraic set V(J) = {x € k” | f(x) =0,Vf € J} 
of an ideal J in the polynomial ring R is the set of common roots of all these functions f. 
The algebraic sets are the closed sets in the Zariski topology of R. The ring R/I(V) is the 
coordinate ring of the algebraic set V.| The Hilbert Nullstellensatz is 


Theorem: [(V(J)) = VJ. 


The theorem is due to Hilbert from 1893 (page 320). Of course, Hilbert did not yet use the 
language of ideals but in terms of having “ganze rationalen homogene Funktionen" of several 
variables. A simple example is when J = (p) = (x? — 2xy + y”) is the ideal J generated by p in 
Rix, y|; then V(J) = {a = y} and I(V(J)) is the ideal generated by x — y. For literature, see 


13. CRYPTOLOGY 


An integer p > 1 is prime if 1 and p are the only factors of p. The number k mod p is the 
reminder when dividing k by p. For example 18mod7 = 4. Fermat’s little theorem is 


Theorem: a? = a mod p for every prime p and every integer a. 


The theorem was found by Pierre de Fermat in 1640. A first proof appeared in 1683 by 
Leibniz. Euler in 1736 published the first proof. The result is used in the Diffie-Hellman key 
exchange, where a large public prime p and a public base value a are taken. Ana chooses a 
number x and publishes X = a*modp and Bob picks y publishing Y = a¥modp. Their secret 
key is K = X¥ = Y*. An adversary Eve who only knows a,p,X and Y can from this not get 
K due to the difficulty of the discrete log problem. More generally, for possibly composite 
numbers n, the theorem extends to the fact that a®”) = 1 modulo p, where the Euler’s totient 
function ¢(n) counts the number of positive integers less than n which are coprime to n. The 
generalized Fermat theorem is the key for RSA crypto systems: in order for Ana and Bob 
to communicate. Bob publishes the product n = pq of two large primes as well as some base 
integer a. Neither Ana nor any third party Eve do know the factorization. Ana communicates a 
message x to Bob by sending X = a*modn using modular exponentiation. Bob, who knows 
p,q, can find y such that ry = 1 mod ¢(n). This is because of Fermat a®-)4@-) = a mod n. 
Now, he can compute x = y~'mod ¢(n). Not even Ana herself could recover x from X. 


14. SPECTRAL THEOREM 


A bounded linear operator A on a Hilbert space is called normal if AA* = A*A, where 


A* =A’ is the adjoint and A” is the transpose and A is the complex conjugate. Examples 
of normal operators are self-adjoint operators (meaning A = A*) or unitary operators 
(meaning AA* = 1). 
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Theorem: A is normal if and only if A is unitarily diagonalizable. 


In finite dimensions, any unitary U diagonalizing A using B = U* AU contains an orthonormal 
eigenbasis of A as column vectors. The theorem is due to Hilbert. In the self-adjoint case, 
all the eigenvalues are real and in the unitary case, all eigenvalues are on the unit circle. The 
result allows a functional calculus for normal operators: for any continuous function f and 


any bounded linear operator A, one can define f(A) = Uf(B)U*, if B = U*AU. See [142]. 


15. NUMBER SYSTEMS 


A monoid is a set X equipped with an associative operation * and an identity element 
1 satisfying 1* « = x for all xz € X. Associativity means x * (y * z) = (a * y) * z for all 
x,y,z © X. The monoid structure belongs to a collection of mathematical structures magmas 
> semigroups > monoids D groups. A monoid is commutative, if x * y = y * x for 
all z,y € X. A group is a monoid in which every element x has an inverse y satisfying 
Leo 7 eo = 1 


Theorem: Every commutative monoid can be extended to a group. 


The general result is due to Alexander Grothendieck from around 1957. A more precise state- 
ment is that there is a group containing a homomorphic image of the monoid. It is for can- 
cellative monoids (the statement a * x = b * x implies a = b in the monoid) that the monoid 
is also contained isomorphically inside the group. In general, like for a zero monoid with 3 or 
more elements defined by x * y = 1 for all x, y which is not cancellative, such a collapse already 
appears. The group is called the Grothendieck group completion of the monoid. For ex- 
ample, the additive monoid of natural numbers can be extended to the group of integers, the 
multiplicative monoid of non-zero integers can be extended to the group of rational numbers. 
The construction of the group is used in K-theory [28] [865] For insight about the philosophy 
of Grothendieck’s mathematics, see [481]. 


16. COMBINATORICS 


Let |X| denote the cardinality of a finite set X. This means that |X| is the number of elements 
in X. A function f from a set X to a set Y is called injective if f(x) = f(y) implies x = y. 
The pigeonhole principle tells: 


Theorem: If |X| > |Y| then no function X > Y can be injective. 


This implies that if we place n items into m boxes and n > m, then one box must contain 
more than one item. The principle is believed to be formalized first by Peter Dirichlet. Despite 
its simplicity, the principle has many applications, like proving that something exists. An 
example is the statement that there are two trees in New York City streets which have the 
same number of leaves. The reason is that the U.S. Forest services states 5927130 trees in 
the year 2006 and that a mature, healthy tree has about 200’000 leaves. One can also use 
it for less trivial statements like that in a cocktail party there are at least two with the same 
number of friends present at the party. A mathematical application is the Chinese remainder 
Theorem stating that that there exists a solution to aja = b; mod m, all disjoint pairs m;,m, 
and all pairs a;,m, are relatively prime [467]. The principle generalizes to infinite set if 
|X| is the cardinality. It implies then for example that there is no injective function from the 
real numbers to the integers. For literature, see for example [93], which states also a stronger 
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version which for example allows to show that any sequence of real n?+ 1 real numbers contains 
either an increasing subsequence of length n+ 1 or a decreasing subsequence of length n + 1. 


17. COMPLEX ANALYSIS 


Assume f is an analytic function in an open domain G of the complex plane C. Such 
a function is also called holomorphic in G. Holomorphic means that if f(a + iy) = u(a# + 
iy) + iv(a + iy), then the Cauchy-Riemann differential equations u,; = vy,uy = —v, hold in 
G. Assume z is in G and assume CC G is acircle a+ re” centered at z which is bounding a 
diss D={zEC||z-a]| <r}CG. 


poe 


Theorem: For analytic f and circle C C G, one has f(a) = + cman 


This Cauchy integral formula of Cauchy is used for other results and estimates. It implies 
for example the Cauchy integral theorem assuring that [, cf (z)dz = 0 for any simple closed 
curve C' in G bounding a eee connected region D C G. Morera’s theorem assures that 
for any domain G, if dad ) dz = 0 for all simple closed smooth curves C' in G, then f is 
holomorphic in G. An ie el ara is residue calculus: For a simply connected region 
G and a function f which is analytic except in a finite set A of points. If Cis piecewise smooth 
continuous closed curve not intersecting A, then [., f(z) dz = 2mi 104 1(C, a)Res(f, a), where 
I(C,a) is the winding number of C' with respect to a and Res(f,a) is the residue of f at a 
which is in the case of poles given by lim,_,,(z — a) f(z). See [147]. 


18. LINEAR ALGEBRA 


If A is am xX n matrix with image ran(A) and kernel ker(A). If V is a linear subspace of 
R™, then V+ denotes the orthogonal complement of V in R™, the linear space of vectors 
perpendicular to all x € V. 


Theorem: dim(kerA) + dim(ranA) = n, dim((ranA)+) = dim(kerA’). 


The result is used in data fitting for example when understanding the least square solu- 
tion x = (A7A)~'A‘b of a system of linear equations Av = b. It assures that A’A is 
invertible if A has a trivial kernel. The result is a bit stronger than the rank-nullity theorem 
dim(ran(A)) + dim(ker(A)) = n alone and implies that for finite m x n matrices the index 
dim(ker A) — dim(kerA*) is always n —m, which is the value for the 0 matrix. For literature, see 
\635]. The result has an abstract generalization in the form of the group isomorphism theorem 
for a group homomorphism f stating that G/ker(f) is isomorphic to f(G). It can also be 
described using the singular value decomposition A = UDV?. The number r = ranA has 
as a basis the first r columns of U. The number n — r = kerA has as a basis the last n — r 
columns of V. The number ranA? has as a basis the first r columns of V. The number kerA? 
has as a basis the last m —r columns of U. 


19. DIFFERENTIAL EQUATIONS 


A differential equation ¢x = f(x) and x(0) = xo in a Banach space (X, || - ||) (a normed, 
complete vector space) defines an initial value problem: we look for a solution x(t) satisfying 
the equation and given initial condition x(0) = xp and t € (—a,a) for some a > 0. A function 
f from R to X is called Lipschitz, if there exists a constant C' such that for all z,y € X the 


inequality || f(x) — f(y)|| < C|x — y| holds. 
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Theorem: If f is Lipschitz, a unique solution of 2’ = f(x), x(0) = xo exists. 


This result is due to Picard and Lindelof from 1894. Replacing the Lipschitz condition with 
continuity still gives an existence theorem which is due to Giuseppe Peano in 1886, but 
uniqueness can fail like for 2’ = ,/z,x(0) = 0 with solutions x = 0 and z(t) = t?/4. The 
example x/(t) = x?(t),x(0) = 1 with solution 1/(1 — t) shows that we can not have solutions 
for all t. The proof is a simple application of the Banach fixed point theorem. For literature, 
see [132]. 


20. LOGIC 


An axiom system A is a collection of formal statements assumed to be true. We assume it to 
contain the basic Peano axioms of arithmetic. (One only needs first order Peano arithmetic 
PA, for the first incompletness theorem one can even do with the weaker Robinson arithmetic. ) 
An axiom system is complete, if every true statement can be proven within the system. The 
system is consistent if one can not prove 1 = 0 within the system. It is provably consistent 
if one can prove a theorem "The axiom system A is consistent." within the system. It is 
important that the axiom system is strong enough to contain the Peano arithmetic as there are 
interesting and widely studied theories that happen to be complete, such as the theory of real 
closed fields. 


Theorem: An axiom system is neither complete nor provably consistent. 


The result is due to Kurt Gédel who proved it in 1931. In this thesis, G6del had proven 
a completeness theorem of first order predicate logic. The incompleteness theorems of 1931 
destroyed the dream of Hilbert’s program which aimed for a complete and consistent axiom 
system for mathematics. A commonly assumed axiom system is the Zermelo-Frenkel axiom 
system together with the axiom of choice ZFC. Other examples are Quine’s new foundations 
NF or Lawvere’s elementary theory of the category of sets ETCS. For a modern view on 
Hilbert’s program, see [655]. For Gédel’s theorem [508]. Hardly any other theorem had so 


much impact outside of mathematics. 


21. REPRESENTATION THEORY 


For a finite group or compact topological group G, one can look at representations, 
group homomorphisms from G to the automorphisms of a vector space V. A representation 
of G is irreducible if the only G-invariant subspaces of V are 0 or V. The direct sum of of 
two representations ¢, w is defined as 6G w(g)(u Pw) = O(g)(v) 6 o(g)(w). A representation is 
semi simple if it is a unique direct sum of irreducible finite-dimensional representations: 


Theorem: Representations of compact topological groups are semi simple. 


For representation theory, see [690]. Pioneers in representation theory were Ferdinand Georg 
Frobenius, Herman Weyl, and Elie Cartan. Examples of compact groups are finite group, or 
compact Lie groups (a smooth manifold which is also a group for which the multiplications 
and inverse operations are smooth) like the torus group 7”, the orthogonal groups O(n) of 
all orthogonal n x n matrices or the unitary groups U(n) of all unitary n x n matrices or the 
group Sp(n) of all symplectic n x n matrices. Examples of groups that are not Lie groups are 
the groups Z, of p-adic integers, which are examples of pro-finite groups. 
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22. LIE THEORY 


Given a topological group G, a Borel measure yp on G is called left invariant if ju(gA) = 
(A) for every g € G and every measurable set A C G. A left-invariant measure on G is also 
called a Haar measure. A topological space is called locally compact, if every point has a 
compact neighborhood. 


Theorem: A locally compact group has a unique Haar measure. 


Alfréd Haar showed the existence in 1933 and John von Neumann proved that it is unique. 
In the compact case, the measure is finite, leading to an inner product and so to unitary 
representations. Locally compact Abelian groups G can be understood by their characters, 
continuous group homomorphisms from G to the circle group T = R/Z. The set of characters 


defines a new locally compact group G, the dual of G. The multiplication is the pointwise 
multiplication, the inverse is the complex conjugate and the topology is the one of uniform 
convergence on compact sets. If G is compact, then G is discrete, and if G is discrete, then 
G is compact. . order to ee ee duality ‘G =G, one needs a generalized Fourier 


transform f(x = fat | x)dp(x) which uses the Haar measure. The inverse Fourier 
transform hed back aS mh 5h a isi avn The Haar measure is also used to 
define the convolution f * g(x) = J, f( (y)du(y) rendering L'(G) a Banach algebra. 


The Fourier transform then ree a ce from L1(G) to Co(G) or a unitary 
transformation from L*(G) to L?(G). For literature, see [124] [678]. 


23. COMPUTABILITY 


The class of general recursive functions is the smallest class of functions which allows 
projection, iteration, composition and minimization. The class of Turing computable 
functions are the functions which can be implemented by a Turing machine possessing 
finitely many states. Turing introduced this in 1936 [539]. 


Theorem: The generally recursive class is the Turing computable class. 


Kurt Géddel and Jacques Herbrand defined the class of general recursive functions around 1933. 
They were motivated by work of Alonzo Church who then created \ calculus later in 1936. 
Alan Turing developed the idea of a Turing machine which allows to replace Herbrand-Gédel 
recursion and A calculus. The Church thesis or Church-Turing thesis states that everything 
we can compute is generally recursive. As “whatever we can compute" is not formally defined, 
this always will remain a thesis unless some more effective computation concept would emerge. 


24. CATEGORY THEORY 


Given an element A in a category C, let h4 denote the functor which assigns to a set X the 
set Hom(A, X) of all morphisms from A to X. Given a functor F from C' to the category 
S = Set, let N(G, F) be the set of natural transformations from G = h4 to F. (A natural 
transformation between two functors G and F' from C' to S assigns to every object x in 
C a morphism 7, : G(x) > F(a) such that for every morphism f : x —> y in C' we have 
nyoG(f) = F(f)on..) The functor category defined by C' and S has as objects the functors 
F and as morphisms the natural transformations. The Yoneda lemma is 
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Theorem: N(h4, F) can be identified with F(A). 


Category theory was introduced in 1945 by Samuel Eilenberg and Sounders Mac Lane. The 
lemma above is due to Nobuo Yoneda from 1954. It allows to see a category embedded in a 
functor category which is a topos and serves as a sort of completion. One can identify a 
set S for example with Hom(1,$). An other example is Cayley’s theorem stating that the 
category of groups can be completely understood by looking at the group of permutations of 


G. For category theory, see [434]. For history, [425]. 


25. PERTURBATION THEORY 


A function f of several variables is called smooth if one can take first partial derivatives 
like O,,0, and second partial derivatives like 0,0,f (x,y) = fey(x,y) and still have continuous 
functions. Assume f(x,y) is a smooth function of two Euclidean variables z,y € R”. If 
f(a,0) = 0, we say a is a root of x > f(x,y). If f,(vo,y) is invertible, the root is called 
non-degenerate. If there is a solution f(g(y), y) = 0 such that g(0) = a and g is continuous, 
the root a has a local continuation and say that it persists under perturbation. 


Theorem: A non-degenerate root persists under perturbation. 


This is the implicit function theorem. There are concrete and fast algorithms to compute the 
continuation. An example is the Newton method which iterates T(x) = « — f(x, y)/f2(x, y) 
to find the roots of x > f(x,y) for fixed y. The importance of the implicit function theorem 
is both theoretical as well as applied. The result assures that one can makes statements about 
a complicated theory near some model, which is understood. There are related situations, like 
if we want to continue a solution of F(z,y) = (f(x,y), g(x, y)) = (0,0) giving equilibrium 
points of the vector field F. Then the Newton step T (x,y) = (x,y) — dF7\(a,y) - F(z, y) 
method allows a continuation if dF (x,y) is invertible. This means that small deformations of 
F do not lead to changes of the nature of the equilibrium points. When equilibrium points 
change, the system exhibits bifurcations. This in particular applies to F'(z,y) = Vf(z,y), 
where equilibrium points are critical points. The derivative dF’ of F is then the Hessian. 
423) call it one of the most important and oldest pradigms in modern mathematics for which 
the germ of the idea was already formed in the writings of Isaac Newton and Gottfried Leibniz 
but only riped under Augustin-Louis Cauchy to the theorem we know today. 


26. COUNTING 


A simplicial complex X is a finite set of non-empty sets that is closed under the operation 
of taking finite non-empty subsets. The Euler characteristic y of a simplicial complex G is 
defined as y(X) = )>,-y(—1)%™™, where the dimension dim(z) of a set x is its cardinality 
|x| minus 1. 


Theorem: y(X x Y) = x(X)x(Y). 


For zero-dimensional simplicial complexes G, (meaning that all sets in G have cardinality 
1), we get the rule of product: if you have m ways to do one thing and n ways to do 
an other, then there are mn ways to do both. This fundamental counting principle is 
used in probability theory for example. The Cartesian product X x Y of two complexes 
is defined as the set-theoretical product of the two finite sets. It is not a simplicial complex 
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any more in general but has the same Euler characteristic than its Barycentric refinement 
(X x Y)1, which is a simplicial complex. The maximal dimension of A x B is dim(A) + dim(B) 
and px(t) = > ¢_9 ve(X)t* is the generating function of v,(X), then pxyy(t) = px(t)py(t) 
implying the counting principle as py(—1) = x(X). The function px(t) is called the Euler 
polynomial of X. The importance of Euler characteristic as a counting tool lies in the fact 
that only x(X) = px(—1) is invariant under Barycentric subdivision y(X) = X,, where X; 
is the complex which consists of the vertices of all complete subgraphs of the graph in which 
the sets of X are the vertices and where two are connected if one is contained in the other. 
The concept of Euler characteristic goes so over to continuum spaces like manifolds where the 
product property holds too. See for example [14]. 


27. METRIC SPACES 


A continuous map T’: X + X, where (X,d) is a complete non-empty metric space is called 
a contraction if there exists a real number 0 < A < 1 such that d(T(x),T(y)) < Ad(a, y) for 
all x,y € X. The space is called complete if every Cauchy sequence in X has a limit. (A 
sequence x, in X is called Cauchy if for all € > 0, there exists n > 0 such that for all 7,7 > n, 
one has d(2x;,2;) < €.) 


Theorem: A contraction has a unique fixed point in X. 


This result is the Banach fixed point theorem proven by Stefan Banach from 1922. The 
example case T(x) = (1 — x?)/2 on X = Q/:Q (0.3, 0.6] having contraction rate \ = 0.6 and 
T(X) = QN (0.32, 0.455] C X shows that completeness is necessary. The unique fixed point of 
T in X is /2—1 =0.414... which is not in Q because 2 = p/q would imply 2q? = p?, which 
is not possible for integers as the left hand side has an odd number of prime factors 2 while the 
right hand side has an even number of prime factors. See [529] 


28. DIRICHLET SERIES 


The abscissa of simple convergence of a Dirichlet series ¢(s) = >, ane’ is o9 = 


inf{a € R | ¢(z) converges for all Re(z) > a }. For A, = n we have the Taylor series 
f(z) = 2 Ganz” with z = e7*. For A, = log(n) we have the standard Dirichlet series 
yo an/n’. For example, for a, = 2”, one gets the poly-logarithm Li,(z) = S°°., 2"/n* and 
especially Li,(1) = ¢(s), the Riemann zeta function or the Lerch transcendent ®(z, 5, a) = 
yo, 2"/(n + a)*. Define S(n) = So7_, ax. The Cahen’s formula applies if the series S(n) 
does not converge. 


a 1 
Theorem: oo = limsup,, ,., “2 |5(n)| 


There is a similar formula for the abscissa of absolute convergence of ¢ which is defined 
as 0, = inf{a € R | ¢(z) converges absolutely for all Re(z) > a }. The result is o, = 


lim sup,, 500 areas For example, for the Dirichlet eta function ¢(s) = S>~,(—1)""'/n° 
has the abscissa of convergence 09 = 0 and the absolute abscissa of convergence og = 1. The 
series C(s) = )> ~~, e" /n® has og = 1 and o9 = 1 — a. If ay is multiplicative Qnim = QnQm for 


relatively prime n,m, then )°?°; an/n* = [],(1 + ap/p* + a,2/p** + ---) generalizes the Euler 
golden key formula )°,, 1/n* = J],(1 —1/p*)~*. See [296] 298}. 
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29. ‘TRIGONOMETRY 


Mathematicians had a long and painful struggle with the concept of limit. One of the first to 
ponder the question was Zeno of Elea around 450 BC [474]. Archimedes of Syracuse made some 
progress around 250 BC. Since Augustin-Louis Cauchy one uses the notion of limits. See 
also [263]. Today, one defines the limit lim, ,, f(x) = b to exist if and only if for all « > 0, 
there exists a 6 > 0 such that if |a — a| < 6, then | f(a) — b| < «. A place where limits appear 
are when computing derivatives g'(0) = limzo[g(x) — g(0)|/a. In the case g(x) = sin(z), 
one has to understand the limit of the function f(x) = sin(x)/a which is the sine function. 
A prototype result is the fundamental theorem of trigonometry (called as such in some 
calculus texts like [90]). 


Theorem: lim,-,osin(x)/x = 1. 


It appears strange to give weight to such a special result but it explains the difficulty of limit 
and the VH6pital rule of 1694, which was formulated in a book of Bernoulli commissioned to 
Hopital: the limit can be obtained by differentiating both the denominator and nominator and 
taking the limit of the quotients. The result allows to derive (using trigonometric identities) 
that in general sin’(2) = cos(x) and cos’(x) = —sin(x). One single limit is the gateway. It is im- 
portant also culturally because it embraces thousands of years of struggle. It was Archimedes, 
who used the theorem when computing the circumference of the circle formula 27r using 
exhaustion using regular polygons from the inside and outside. Comparing the lengths of 
the approximations essentially battled that fundamental theorem of trigonometry. The iden- 
tity is therefore the epicenter around the development of trigonometry, differentiation and 
integration. 


30. LOGARITHMS 


The natural logarithm is the inverse of the exponential function exp(z) establishing so a 
group homomorphism from the additive group (R,+) to the multiplicative group (R*, *). 
We have: 


Theorem: log(wv) = log(u) + log(v). 


This follows from exp(x + y) = exp(x) exp(y) and log(exp(x)) = exp(log(x)) = x by plugging 
in x = log(u), y = log(v). The logarithms were independently discovered by Jost Biirgi around 
1600 and John Napier in 1614 [623]. The logarithm with base b > 0 is denoted by log,. It is the 
inverse of > b” = e”'8), The concept of logarithm has been extended in various ways: in any 
group G, one can define the discrete logarithm log,(a) to base b as an integer k such that 
b* = a (if it exists). For complex numbers the complex logarithm log(z) as any solution w of 
e” = z. It is multi-valued as log(|z|) + iarg(z) + 277k all solve this with some integer k, where 
arg(z) € (—7,7). The identity log(uv) = log(u)+log(v) is now only true up to 27ki. Logarithms 
can also be defined for matrices. Any matrix B solving exp(B) = A is called a logarithm of 
A. For A close to the identity IJ, can define log(A) = (A — I) — (A—1)?/2 + (A —1)?/3-..., 
which is a Mercator series. For normal invertible matrices, one can define logarithms 
using the functional calculus by diagonalization. On a Riemannian manifold M, one also 
has an exponential map: it is a diffeomorphim from a small ball B,(0) in the tangent space 
x € M to M. The map v > exp,(v) is obtained by defining exp,(0) = x and by taking for 
v #0 a geodesic with initial direction v/|v| and running it for time |v|. The logarithm log, 
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is now defined on a geodesic ball of radius r and defines an element in the tangent space. In 
the case of a Lie group M = G, where the points are matrices, each tangent space is its Lie 
algebra. 


31. GEOMETRIC PROBABILITY 


A subset K of R” is called compact if it is closed and bounded. By Bolzano- Weierstrass 
this is equivalent to the fact that every infinite sequence x, in K has a subsequence which 
converges. A subset K of R” is called convex, if for any two given points z,y € K, the 
interval {x + t(y — x),t € [0,1]} is a subset of k. Let G be the set of all compact convex 
subsets of R”. An invariant valuation X is a function X : G — R satisfying X(A U 
B)+X(ANB) = X(A) 4+ X(B), which is continuous in the Hausdorff metric d(K, L) = 
max(sup,cx iMfye, d(x, y) +sup,cx infre, d(x, y)) and invariant under rigid motion generated 
by rotations, reflections and translations in the linear space R”. 


Theorem: The space of valuations is (n + 1)-dimensional. 


The theorem is due to Hugo Hadwiger from 1937. The coefficients a;(G) of the polynomial 
Vol(G + tB) = >7"_, a;t? are a basis, where B is the unit ball B = {|2| < 1}. See [385]. 


32. PARTIAL DIFFERENTIAL EQUATIONS 


A quasilinear partial differential equation is a differential equation of the form u;(x,t) = 
F(a,t,u)-Vzu(#,t) + f(z, t, u) with analytic initial condition u(x,0) = uo(x) and an analytic 
vector field F’. It defines a quasi-linear Cauchy problem. 


Theorem: A quasi-linear Cauchy problem has a unique analytic solution. 


This is the Cauchy-Kovalevskaya theorem. It was initiated by Augustin-Louis Cauchy 
in 1842 and proven in 1875 by Sophie Kowalevskaya. Analyticity is important, smoothness 
alone is not enough. If F' is analytic in each variable, one can look at equations like the 
Cauchy problem uw = F(t, x, U, Ux, Urx). Examples are partial differential equations like the heat 
equation uz; = Uz or the wave equation uz = Uzz. Given an initial condition u(0,x2) = u(x) 
one then deals with an ordinary differential equation in a function space. One can then try 
to approach the Cauchy-Kovalevskaya problem by Picard-Lindelof. The problem is that the 
Lipschitz condition fails because the corresponding operators are unbounded. Even Cauchy- 
Peano (which does not ask for uniqueness) fails. And this even in an analytic setting. [519 
gives the example u; = Uzz with initial condition u(0, 7) = 1/(1+27) for which the entire series 
solving the problem has a zero radius of convergence in x for any t > 0. Texts like 
give full versions of the Cauchy-Kovalevskaya theorem for real-analytic Cauchy initial data on 
a real analytic hypersurface satisfying a non-characteristic condition for the partial differential 
equation. For a shorter introduction to partial differential equations, see [23]. 


33. GAME THEORY 


If S = (S),...,S,) are n players and f = (fi,..., fn) is a payoff function defined on a 
strategy profile x = (21,...,2,). A point x* is called an equilibrium if f;(2*) is maximal 
with respect to changes of x; alone in the profile x for every player 2. 


Theorem: There is an equilibrium for any game with mixed strategy 
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The equilibrium is called a Nash equilibrium. It tells us what we would see in a world if 
everybody is doing their best, given what everybody else is doing. John Forbes Nash used 
in 1950 the Brouwer fixed point theorem and later in 1951 the Kakutani fixed point 
theorem to prove it. The Brouwer fixed point theorem itself is generalized by the Lefschetz 
fixed point theorem which equates the super trace of the induced map on cohomology with 
the sum of the indices of the fixed points. About John Nash and some history of game theory, 
see [601]: game theory started maybe with Adam Smith’s “the Wealth of Nations" published 
in 1776, Ernst Zermelo in 1913 (Zermelo’s theorem), Emile Borel in the 1920s and John von 
Neumann in 1928 pioneered mathematical game theory. Together with Oskar Morgenstern, 
John von Neumann merged game theory with economics in 1944. Nash published his thesis in 
a paper of 1951. For the mathematics of games, see [681]. 


34. MEASURE THEORY 


A topological space with open sets © defines the Borel o-algebra, the smallest o algebra 
which contains O. For the metric space (R,d) with d(xz,y) = |x — y|, already the intervals 
generate the Borel o algebra A. A Borel measure is a measure defined on a Borel o-algebra. 
Every Borel measure ju on the real line R can be decomposed uniquely into an absolutely 
continuous part /lac, a Singular continuous part ju,- and a pure point part [lpp: 


Theorem: — {f= flac + fae > Lipp: 


This is called the Lebesgue decomposition theorem. It uses the Radon-Nikodym the- 
orem. The decomposition theorem implies the decomposition theorem of the spectrum of 
a linear operator. See (like page 259). Lebesgue’s theorem was published in 1904. A 
generalization due to Johann Radon and Otto Nikodym was done in 1913. 


35. GEOMETRIC NUMBER THEORY 


If [ is a lattice in R", denote with R"/T the fundamental region and by |I| its volume. A 
set K is convex if z,y € K implies e+ t(x —y) € K for allO <t <1. A set K is centrally 
symmetric if x € K implies —x € K. A region is Minkowski if it is convex and centrally 
symmetric. Let || denote the volume of K. 


Theorem: If K is Minkowski and |A| > 2"|I| then KOT £0. 


The theorem is due to Hermann Minkowski in 1896. It lead to a field called geometry of 
numbers. [115]. It has many applications in number theory and Diophantine analysis 


[99] [333] 


36. FREDHOLM 


An integral kernel K(x, y) € L?({a,6]*) defines an integral operator A defined by Af (x) = 
i K(az,y)f(y) dy with adjoint T* f(x) = ft K(y,x)f(y) dy. The L? assumption makes the 
function K(x, y) what one calls a Hilbert-Schmidt kernel. Fredholm showed that the Fred- 
holm equation A*f = (T* — X)f = g has a solution f if and only if f is perpendicular to 
the kernel of A = T — \. This identity ker(A)+ = im(A*) is in finite dimensions part of the 
fundamental theorem of linear algebra. The Fredholm alternative reformulates this in 
a more catchy way as an alternative: 
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Theorem: Either 4f 4 0 with Af = 0 or for all g, Sf with Af = g. 


In the second case, the solution depends continuously on g. The alternative can be put more 
generally by stating that if A is a compact operator on a Hilbert space and A is not an 
eigenvalue of A, then the resolvent (A —.)~! is bounded. A bounded operator A on a Hilbert 
space H is called compact if the image of the unit ball is relatively compact (has a compact 
closure). The Fredholm alternative is part of Fredholm theory. It was developed by Ivar 
Fredholm in 1903. 


37. PRIME DISTRIBUTION 


The Dirichlet theorem about the primes along an arithmetic progression tells that if a and b 
are relatively prime meaning that there largest common divisor is 1, then there are infinitely 
many primes of the form p = a mod b. The Green-Tao theorem strengthens this. We say 
that a set A contains arbitrary long arithmetic progressions if for every k there exists an 
arithmetic progression {a+ bj,j7 =1,--- ,k} within A. 


Theorem: The set of primes contains arbitrary long arithmetic progressions. 


The Dirichlet prime number theorem was found in 1837. The Green-Tao theorem 
was done in 2004 and appeared in 2008 [269]. It uses Szemerédi’s theorem [236] which 
shows that any set A of positive upper density lim sup,_,,,|AM {1---n}|/n has arbitrary long 
arithmetic progressions. So, any subset A of the primes P for which the relative density 
lim sup,,_,., |AN{1---n}|/|PN{1---n}| is positive has arbitrary long arithmetic progressions. 
For non-linear sequences of numbers the problems are wide open. The Landau problem of 
the infinitude of primes of the form x? + 1 illustrates this. The Green-Tao theorem gives hope 
to tackle the Erdés conjecture on arithmetic progressions telling that a sequence {z,} 
of integers satisfying }°,, x, = co contains arbitrary long arithmetic progressions. 


38. RIEMANNIAN GEOMETRY 


A Riemannian manifold is a smooth finite dimensional manifold M equipped with a smooth, 
symmetric, positive definite tensor g defining on each tangent space 7),M an inner 
product (u,v). = (g(x)u,v) = ¥0,,; gij(«)u'v?. Let Q be the space of smooth vector fields 
X on M. A vector field_X acts on smooth functions f as directional derivative X f = 6x f. Given 
two vector fields X, Y on M, one has at each point x € M a number g(X,Y) = (g(x) X (x), Y(2)) 
so that g(X,Y) is a smooth function on M. A connection is a bilinear map (X,Y) > 
VxY from 2 x 2 to © satisfying the differentiation rules VrxY = fVxY and Leibniz rule 
Vx(fY) = df(X)Y¥ + fVxY. It is compatible with the metric if the Lie derivative 
satisfies 6xg(Y, Z) = g(VxY, Z) + g(Y,TxZ). It is torsion-free if VxY — VyX = [X,Y] is 
the Lie bracket on 1. 


Theorem: There is exactly one torsion-free connection compatible with g. 


This is the fundamental theorem of Riemannian geometry. The connection is called the 
Levi-Civita connection, named after Tullio Levi-Civita. One proof goes by establishing the 
Koszul formula which determines V xY explicitely 


2g(Vx¥, Z) = Xg(¥,Z) + ¥9(X,Z) — Zo(X,¥) — 9( X,Y, Z)) — 9(V XZ) + 9(Z[XY)). 
See for example [156]. 
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39. SYMPLECTIC GEOMETRY 


A symplectic manifold (/,w) is a smooth 2n-manifold M equipped with a non-degenerate 
closed 2-form w. The later is called a symplectic form. As a 2-form, it satisfies w(x,y) = 
—w(y,x). Non-degenerate means w(u,v) = 0 for all v implies u = 0. The standard 
symplectic form is wo = )),, da; A dzj. 

Theorem: Every symplectic form is locally diffeomorphic to wo. 


This theorem is due to Jean Gaston Darboux from 1882. Modern proofs use Moser’s trick from 
1965 (i.e. [317]). The Darboux theorem assures that locally, two symplectic manifolds of the 
same dimension are symplectic equivalent. It also implies that symplectic matrices A (2n x2n 
0 TL 
—I 0 
which is not obvious as applying the determinant to A?.JA = J only establishes det(A)? = 1. 
In contrast, for Riemannian manifolds, one can not trivialize the Riemannian metric in a 
neighborhood one can only render it the standard metric at the point itself. 


matrices satisfying A’ JA = J with skew symmetric J = ) have determinant 1 


40. DIFFERENTIAL TOPOLOGY 


Given a smooth function f on a differentiable manifold /. Let df denote the gradient 
of f. A point x is called a critical point, if df(x) = 0. We assume f has only finitely many 
critical points and that all of them are non-degenerate. The later means that the Hessian 
d* f(x) is invertible at x. One calls such functions Morse functions. The Morse index 
of a critical point x is the number of negative eigenvalues of d?f. The Morse inequalities 
relate the number c,(f, A) of critical points of index k of f with the Betti numbers 0;(/), 
defined as the nullity of the Hodge star operator dd* + d*d restricted to k-forms Q;, where 
dp: Qy 4 Ox. is the exterior derivative. 


Theorem: Ch = Cp =) qe oo Sp (—1)*co > by = bp_-4 a O88 ae (—1)*bp. 


These are the Morse inequalities due to Marston Morse from 1934. It implies in particular 
the weak Morse inequalities b, < c,. Modern proofs use Witten deformation of the 
exterior derivative d. 


41. NON-COMMUTATIVE GEOMETRY 


A spectral triple (A, H, D) is given by a Hilbert space H, a C*-algebra A of operators 
on H and a densely defined self-adjoint operator D satisfying ||[D,a]|| < co for alla € A 
the operator ec?” is trace class. The operator D is called a Dirac operator. The set-up 
generalizes Riemannian geometry because of the following result dealing with the exterior de- 
rivative d on a Riemannian manifold (IM, g), where A = C(M) is the C*-algebra of continuous 
functions and D = d+ d* is the Dirac operator, defining a spectral triple for (M,g). Let 6 
denote the geodesic distance in (M, g): 


Theorem: 6(z,y) = supyseayp,pii<il f(z) — f(Y)I- 


This formula of Alain Connes tells that the spectral triple determines the geodesic distance 
in (M,g) and so the metric g. It justifies to look at spectral triples as non-commutative 
generalizations of Riemannian geometry. See [137]. 
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42. POLYTOPES 


A convex polytop P in dimension n is the convex hull of finitely many points in R”. One 
assumes all vertices to be extreme points, points which do not lie in an open line segment 
of P. The boundary of P is formed by (n — 1) dimensional boundary facets. The notion 
of Platonic solid is recursive. A convex polytope is Platonic, if all its facets are Platonic 
(n — 1)-dimensional polytopes and vertex figures. Let p = (po, p3, pa,.--) encode the number 
of Platonic solids meaning that pg is the number of Platonic polytops in dimension d. 


Theorem: There are 5 platonic solids and p = (co, 5, 6,3, 3,3,...) 


In dimension 2, there are infinitely many. They are the regular polygons. The list of 
Platonic solids is “octahedron", “dodecahedron", “icosahedron", “tetrahedron" and “cube" has 
been known by the Greeks already. Ludwig Schlafli first classified the higher dimensional case. 
There are six in dimension 4: they are the “5 cell", the “8 cell" (tesseract), the “16 cell", the “24 
cell", the “120 cell" and the “600 cell". There are only three regular polytopes in dimension 5 
and higher, where only the analog of the tetrahedron, cube and octahedron exist. For literature, 


see [278] [718] [560]. 


43. DESCRIPTIVE SET THEORY 


A metric space (X,d) is a set with a metric d (a function X x X — [0,00) satisfying 
symmetry d(x, y) = d(y, x), the triangle inequality d(x, y)+d(y, z) > d(a,z), and d(x, y) = 
0<ax2=y.) A metric space (X,d) is complete if every Cauchy sequence converges in X. A 
metric space is of second Baire category if the intersection of a countable set of open dense 
sets is dense. The Baire Category theorem tells 


Theorem: Complete metric spaces are of second Baire category. 


One calls the intersection A of a countable set of open dense sets A in X also a generic set 
or residual set. The complement of a generic set is also called a meager set or negligible 
or a set of first category. Such a set is the union of countably many nowhere dense sets. 
Like measure theory, Baire category theory can be used to get existence results. There can be 
surprises: a generic continuous function is not differentiable for example. For descriptive set 
theory, see [377]. The frame work for classical descriptive set theory often are Polish spaces, 
which are separable complete metric spaces. See [84]. 


44. CALCULUS OF VARIATIONS 


Let X be the vector space of smooth, compactly supported functions h on an interval (a, b). 
The fundamental lemma of calculus of variations tells 


Theorem: i f(x)g(x)dx = 0 for all g € X, then f = 0. 


The result is due to Joseph-Louis Lagrange. One can restate this as the fact that if f = 0 
weakly then f is actually zero. It implies that if ( f(x)g'(x) dx = 0 for all g € X, then f is 
constant. This is nice as f is not assumed to be differentiable. The result is used to prove that 


extrema to a variational problem /(x) = f° L(t,z,2’) dt are weak solutions of the Euler 
Lagrange equations L, = d/dtL,. See [248] [503}. 
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45. INTEGRABLE SYSTEMS 


Given a Hamilton differential equation x’ = JVH(x) on a compact symplectic 2n- 
manifold (M,w). The almost complex structure J : T*M — TM is tied to w using a 
Riemannian metric g by w(v,w) = (uv, Jg). A function F : M — R is called an first integral 
if d/dtF(x(t)) = 0 for all t. An example is the Hamiltonian function 4 itself. A set of 
integrals F\,..., Fj, Poisson commutes if {F;,/,} = JVF;- VF, = 0 for all k,j. They 
are linearly independent, if at every point the vectors VF); are linearly independent in the 
sense of linear algebra. A system is Liouville integrable if there are d linearly independent, 
Poisson commuting integrals. The following theorem due to Liouville and Arnold characterizes 
the level surfaces {F = c} = {Fi =a... Fa = ca}: 


Theorem: For a Liouville integrable system, level surfaces F’ = c are tori. 


An example how to get integrals is to write the system as an isospectral deformation of 
an operator L. This is called a Lax system. Such a differential equation has the form 
L' = |B, L], where B = B(L) is skew symmetric. An example is the periodic Toda system 
Gn = An(bn4+1 — bn), b, = 2(a? — a?_,), where (Lu)n = QnUns1 + Gn—1Un—1 + bnUn and (Bu), = 
AnUnt1 — An—1Un—1. An other example is the motion of a rigid body in n dimensions if the 
center of mass is fixed. See [22]. 


46. HARMONIC ANALYSIS 


On the vector space X of continuously differentiable 27 periodic, complex- valued functions, 


define the inner product (f,g) = (27)~' f f(x)g(a) dx. The Fourier coefficients of f are 
fn = (f,en), where {e,(x) = e'"”},ez is the Fourier basis. The Fourier series of f is the 


Sum) egies 


Theorem: The Fourier series of f € X converges point-wise to f. 


Already Fourier claimed this always to be true in his “Théorie Analytique de la Chaleur". 
After many fallacious proofs, Dirichlet gave the first proof of convergence [412]. The case is 
subtle because there are continuous functions for which the convergence fails at some points. 
Lipot Féjer was able to show that for a continuous function f, the coefficients i. nevertheless 
determine the function using Césaro convergence. See [376]. 


47. JENSEN INEQUALITY 


If V is a vector space, a set X is called convex if for all points a,b € X, the line segment 
{tb+(1—t)a | t € [0, 1]} is contained in X. A real-valued function ¢: X — R is called convex 
if d(tb+ (1—t)a) < to(b) + (1 —t)d(a) for all a,b € X and all t € [0,1]. Let now (Q,.4,P) bea 
probability space, and f € L'(Q,P) an integrable function. We write E[f] = J, f(x) dP(«) 
for the expectation of f. For any convex ¢: R > R and f € L'(, P), we have the Jensen 
inequality 


Theorem: 4(Elf]) < Eld()]. 


For $(x) = exp(z) and a finite probability space Q = {1,2,...,n} with f(k) = x, = exp(ys) 
and P{{x}] = 1/n, this gives the arithmetic mean- geometric mean inequality (x, - 
1Q°*+8n)/" < (a, +49 +---+2,)/n. The case o(x) = e” is useful in general as it leads to the 
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inequality e®lf] < E[e/] if ef € L'. For f € L*(w, P) one gets (E[f])? < E[f?] which reflects the 
fact that E[f?] — (E[f])? = E[(f — E[f])?] = Var[f] > 0 where Var[f] is the variance of f. 


48. JORDAN CURVE THEOREM 


A closed curve in the image of a continuous map r : T > R?. It is called simple, if this map 
r is injective. One then calls the map an embedding and the image a topological 1-sphere, 
meaning that it is homeomorphic to the standard circle x? + y? = 1 in R?. The image is then 
called a Jordan curve. The Jordan curve theorem deals with such simple closed curves S 
in the two-dimensional plane. 


Theorem: A simple closed curve divides the plane into two regions. 


The Jordan curve theorem is due to Camille Jordan. His proof [856] was objected at first [388] 
but rehabilitated in [290]. The theorem can be strengthened, a theorem of Schoenflies tells 
that each of the two regions is homeomorphic to the disk {(z,y) € R? | 22 +y? < 1}. In the 
smooth case, it is even possible to extend the map to a diffeomorphism in the plane. In higher 
dimensions, one knows that an embedding of the (d — 1) dimensional sphere in a R¢ divides 
space into two regions. This is the Jordan-Brouwer separation theorem. It is no more true 
in general that the two parts are homeomorphic to {x € R?@ | |z| < 1}: a counter example 
is the Alexander horned sphere which is a topological 2-sphere but where the unbounded 
component is not simply connected and so not homeomorphic to the complement of a unit ball. 


See [84]. 


49. CHINESE REMAINDER THEOREM 


Given integers a,b, a linear modular equation or congruence az + b = 0 mod m asks to 
find an integer x such that ax + b is divisible by m. This linear equation can always be solved 
if a and m are coprime. The Chinese remainder theorem deals with the system of linear 
modular equations x = b; mod m1, x = by mod mg,...,x% = by, mod my, where mz are the 
moduli. More generally, for an integer n x n matrix A we call Ax = bmod m a Chinese 
remainder theorem system or shortly CRT system if the m; are pairwise relatively prime 
and in each row there is a matrix element A;; relatively prime to m,. 


Theorem: Every Chinese remainder theorem system has a solution. 


The classical single variable case case is when Aj; = 1 and A;; = 0 for 7 > 1. Let M = 
My++*Mg+++My be the product. In this one-dimensional case, the result implies that zmod M 
— (« mod mj,...,(a mod m,) is a ring isomorphism. Define M; = M/m,. An explicit 
algorithm is to finding numbers y;, z; with y;M; + z;m; = 1 (finding y, z solving ay + bz = 1 for 
coprime a,b is computed using the Euclidean algorithm), then finding x = bymyy; +--+ + 
ee [467]. The multi-variable version appeared in 2005 and can be found 
also in |643}. 


50. BEZOUT’S THEOREM 


A polynomial is homogeneous if the total degree of all its monomials is the same. A homo- 
geneous polynomial f in n+ 1 variables of degree d > 1 defines a projective hypersurface 
f = 0. Given n projective irreducible hypersurfaces f;, = c, of degree d; in a projective space 
P” we can look at the solution set {f =c} = {fi =«a,---, fe = ce} of a system of nonlinear 
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equations. The Bézout’s bound is d = d,---d, the product of the degrees. Bézout’s theo- 
rem allows to count the number of solutions of the system, where the number of solutions is 
counted with multiplicity. 


Theorem: The set {f = c} is either infinite or has d elements. 


Bézout’s theorem was stated in the “Principia" of Newton in 1687 but was proven fist only in 
1779 by Etienne Bézout. If the hypersurfaces are all irreducible and in “general position", then 
there are exactly d solutions and each has multiplicity 1. This can be used also for affine surfaces. 
If y?-2? —32—5 = 0 isan elliptic curve for example, then y?z—2° —3xz?—5z3 = is a projective 
hypersurface, its projective completion. Bézout’s theorem implies part the fundamental 
theorem of algebra as for n = 1, when we have only one homogeneous equation we have d roots 
to a polynomial of degree d. The theorem implies for example that the intersection of two conic 
sections have in general 2 intersection points. The example x? — yz = 0,27 + 27 — yz = 0 
has only the solution x = z = 0,y = 1 but with multiplicity 2. As non-linear systems of 
equations appear frequently in computer algebra this theorem gives a lower bound on the 
computational complexity for solving such problems. 


51. GROUP THEORY 


A finite group (G,*,1) is a finite set containing a unit 1 € G and a binary operation * : 
G x G > G satisfying the associativity property (x * y) * z = x * (y * z) and such that for 
every x, there exists a unique y = x / such that x * y = y* x = 1. The order n of the group 
is the number of elements in the group. An element x € G generates a subgroup formed by 
1,vz,2? =x2*2,.... This is the cyclic subgroup C(x) generated by «. Lagrange’s theorem 
tells 


Theorem: |C(x)| is a factor of |G| 


The origins of group theory go back to Joseph Louis Lagrange, Paulo Ruffini and Evariste 
Galois. The concept of abstract group appeared first in the work of Arthur Cayley. Given a 
subgroup H of G, the left cosets of H are the equivalence classes of the equivalence relation 
x ~ y if there exists z € H with x = z*y. The equivalence classes G/N partition G. 
The number [G : N] of elements in G/H is called the index of H in G. It follows that 
|G| = |H||G : H] and more generally that if kK is a subgroup of H and H is a subgroup of G 
then [G : K] = |G: H||H : Kk]. The group N generated by z is a called a normal group 
N 4G if for all a € N and all x in G the element 2 * a * a7! is in N. This can be rewritten as 
Hx«xx=x2*H. If N isanormal group, then G/H is again a group, the quotient group. For 
example, if f : G — G’ is a group homomorphism, then the kernel of f is a normal subgroup 
and |G| = |ker(f)||im(f)| because of the first group isomorphism theorem. 


52. PRIMES 


A rational prime (or simply “prime") is an integer larger than 1 which is only divisible 
by 1 or itself. The Wilson theorem allows to define a prime as a number n for which 
(n — 1)! + 1 is divisible by n. Euclid already knew that there are infinitely many primes (if 
there were finitely many p),...,DPn, the new number pip2---p, + 1 would have a prime factor 
different from the given set). It also follows from the divergence of the harmonic series 
¢(1) = SO, 1/n = 14+ 1/24+ 1/3 +--+ and the Euler golden key or Euler product 
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C(s) = 04 1/n? = Xo) prime(1 — 1/p*)* for the Riemann zeta function ¢(s) that there are 
infinitely many primes as otherwise, the product to the right would be finite. 

Let (x) be the prime-counting function which gives the number of primes smaller or equal 
to x. Given two functions f(x), g(x) from the integers to the integers, we say f ~ g, if 
limz+oo f(x)/g(x) = 1. The prime number theorem tells 


Theorem: 1(x) ~ x/log(z). 


The result was investigated experimentally first by Anton Ferkel and Jurij Vega, Adrien-Marie 
Legendre first conjectured in 1797 a law of this form. Carl Friedrich Gauss wrote in 1849 that 
he experimented independently around 1792 with such a law. The theorem was proven in 1896 
by Jacques Hadamard and Charles de la Vallée Poussin. Proofs without complex analysis were 
put forward by Atle Selberg and Paul Erdés in 1949. A simple analytic proof was given by 
Donald Newman in [517]. The prime number theorem also assures that there are infinitely 
many primes but it makes the statement quantitative in that it gives an idea how fast the 
number of primes grow asymptotically. Under the assumption of the Riemann hypothesis, 
Lowell Schoenfeld proved |r(x) — li(a)| < Vx log(x)/(8m), where li(x) = J¥ dt/log(t) is the 
logarithmic integral. 


53. CELLULAR AUTOMATA 


A finite set A called alphabet and an integer d > 1 defines the compact topological space 
Q = A”" of all infinite d-dimensional configurations. The topology is the product topology 
which is compact by the Tychonov theorem. The translation maps 7;(x), = 2n+., are homeo- 
morphisms of ( called shifts. A closed T invariant subset X C Q defines a subshift (X,7). An 
automorphism T of Q which commutes with the translations T; is called a cellular automaton, 
abbreviated C'A. An example of a cellular automaton is a map Tr, = $(@niu;---Tn+u,) where 
U = {u,...ux} C Z7 is a fixed finite set. It is called an local automaton because it is defined 
by a finite rule so that the status of the cell n at the next step depends only on the status of 
the “neighboring cells" {n+u | u€U}. The following result is the Curtis-Hedlund-Lyndon 
theorem: 


Theorem: Every cellular automaton is a local automaton. 


Cellular automata were introduced by John von Neumann and mathematically in 1969 by 
Hedlund [307]. The result appears there. Hedlund saw cellular automata also as maps on 
subshifts. One can so look at cellular automata on subclasses of subshifts. For example, 
one can restrict the cellular automata map T’ on almost periodic configurations, which are 
subsets X of Q on which (X,7},...7;) has only invariant measures ju for which the Koopman 
operators U;f = f(T;) on L?(X, 1) have pure point spectrum. A particularly well studied case 
is d= 1 and A = {0,1}, if U = {-1,0,1}, where the automaton is called an elementary 
cellular automaton. The Wolfram numbering labels the 2° possible elementary automata 
with a number between 1 and 255. The game of life of Conway is a case for d = 2 and 
A = {-1,0,1} x {-1,0,1}. For literature on cellular automata see or as part of complex 
systems [704] or evolutionary dynamics [520]. For topological dynamics, see [162]. 
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54. ‘TOPOS THEORY 


A category has objects as nodes and morphisms as arrows going from one object to an 
other object. There can be multiple connections and self-loops so that one can visualize a 
category as a quiver. Every object has the identity arrow 14. A topos X is a Cartesian 
closed category C' in which finite limits exists and which has a sub-object classifier 2 
allowing to identify sub-objects with morphisms from X to Q. Cartesian closed means 
that one can define for any pair of objects A,B in C the product A x B and an equalizer 
representing solutions f = g to arrows f : A ~ B,G: A — B as well as an exponential 
BA representing all arrows from A to B. An example is the topos of sets. An example of a 
sub-object classifier is Q = {0,1} encoding “true or false". 

The slice category E'/X of a category E with an object X in E is a category, where the objects 
are the arrows from F + X. An E/X arrow between objects f: A X andg: B- X is 
a map s: A — B which produces a commutative triangle in E. The composition is pasting 
triangles together. The fundamental theorem of topos theory is: 


Theorem: The slice category E'/X of a topos E is a topos. 


For example, if E is the topos of sets, then the slice category is the category of pointed 
sets: the objects are then sets together with a function selecting a point as a “base point". 
A morphism f : A + B defines a functor F/B — E/A which preserves exponentials and the 
subobject classifier 2. Topos theory was motivated by geometry (Grothendieck), physics 
(Lawvere), topology (Tierney) and algebra (Kan). It can be seen as a generalization and 
even a replacement of set theory: the Lawvere’s elementary theory of the category of 
sets ETCS is seen as part of ZFC which are less likely to be inconsistent [441]. For a short 
introduction [845], for textbooks [480] [108], for history of topos theory in particular, see [479]. 


55. TRANSCENDENTALS 


A root of an equation f(x) = 0 with integer polynomial f(x) = an2" + dp_1x""1 + +++ + ao 
with n > 0 and a, € Z is called an algebraic number. The set A of algebraic numbers is 
sub-field of the field R of real numbers. The field A is the algebraic closure of the rational 
numbers Q. It is of number theoretic interest as it contains all algebraic number fields, finite 
degree field extensions of Q. The complement R \ A is the set of transcendental numbers. 
Transcendental numbers are necessarily irrational because every rational number x = p/q is 
algebraic, solving gx — p = 0. Because the set of algebraic numbers is countable and the real 
numbers are not, most numbers are transcendental. The group of all automorphisms of A which 
fix Q is called the absolute Galois group of Q. 


Theorem: 7 and e are transcendental 


This result is due to Ferdinand von Lindemann. He proved that e* is transcendental for every 
non-zero algebraic number x. This immediately implies e is transcendental. Now, if 7 were 
algebraic, then wi would be algebraic and e’” = —1 would be transcendental. But —1 is 
rational. Lindemann’s result was extended in 1885 by Karl Weierstrass to the statement telling 
that if 71,...2%, are linearly independent algebraic numbers, then e”!,...e”" are algebraically 
independent. The transcendental property of 7 also proves that 7 is irrational. This is easier 
to prove directly. See [333]. 
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56. RECURRENCE 


A homeomorphism T : X — X of a compact topological space X defines a topological 
dynamical system (X,7T). We write T/(x) = T(T(...T(2))) to indicate that the map T is 
applied j times. For any d > 0, we get from this a set (71, 7>,...,7a) of commuting homeo- 
morphisms on X, where T;(x) = T/x. A point x € X is called multiple recurrent for T if for 
every d > 0, there exists a sequence n, < n2 < n3 <--- of integers nz € N for which Tix 2 
for k + co and all 7 = 1,...,d. Ftrstenberg’s multiple recurrence theorem states: 


Theorem: Every topological dynamical system is multiple recurrent. 


It is known even that the set of multiple recurrent points are Baire generic. Hillel Fiirstenberg 
proved this result in 1975. There is a parallel theorem for measure preserving systems: 
an automorphism 7 of a probability space (Q,A,P) is called multiple recurrent if there 
exists A € A and an integer n such that P[AN7(A)N---MTa(A)] > 0. This generalizes the 
Poincaré recurrence theorem, which is the case d = 1. Recurrence theorems are related 
to the Szemerédi theorem telling that a subset A of N of positive upper density contains 
arithmetic progressions of arbitrary finite length. See [236]. 


57. SOLVABILITY 


A basic task in mathematics is to solve polynomial equations p(x) = a,2"+Gp—yx" 1 +++++ 
a,x + ao = 0 with complex coefficients a, using explicit formulas involving roots. One calls 
this finding an explicit algebraic solution. The linear case ax + b = 0 with x = —b/a, the 
quadratic case ax? + br +c = 0 with x = (—b Vb? — 4ac)/(2a) were known since antiquity. 
The cubic x® + ax? + br + C = 0 was solved by Niccolo Tartaglia and Cerolamo Cardano: a 
first substitution « = X — a/3 produces the depressed cubic X? + pX + q (first solved by 
Scipione dal Ferro). The substitution X = u — p/(3u) then produces a quadratic equation for 
u®. Lodovico Ferrari solved finally the quartic by reducing it to the cubic. It was Paolo Ruffini, 
Niels Abel and Evariste Galois who realized that there are no algebraic solution formulas any 
more for polynomials of degree n > 5. 


Theorem: Explicit algebraic solutions to p(x) = 0 exist if and only ifn < 4. 


The quadratic case was settled over a longer period in independent developments in Babylonian, 
Egyptian, Chinese and Indian mathematics. The cubic and quartic discoveries were dramatic 
culminating with Cardano’s book of 1545, marking the beginning of modern algebra. After 
centuries of failures of solving the quintic, Paolo Ruffini published the first proof in 1799, a 
proof which had a gap but who paved the way for Niels Hendrik Abel and Evariste Galois. For 


further discoveries see [473] 448] [13]. 


58. GALOIS THEORY 


If F is sub-field of EF, then E is a vector space over F’. The dimension of this vector space is 
called the degree [EF : F'| of the field extension E/F’. The field extension is called finite if 
[E : F] is finite. A field extension is called transcendental if there exists an element in E which 
is not a root of an integral polynomial f with coefficients in F’. Otherwise, the extension is called 
algebraic. In the later case, there exists a unique monic polynomial f which is irreducible over 
F and the field extension is finite. An algebraic field extension E’/F is called normal if every 
irreducible polynomial over K with at least one root in FE splits over F' into linear factors. An 
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algebraic field extension E/F is called separable if the associated irreducible polynomial f is 
separable, meaning that f’ is not zero. This means, that F’ has zero characteristic or that f 
is not of the form }>,a,x?* if F has characteristic p. A field extension is called Galois if it 
normal and separable. Let Fields(E/F’) be the set of subfields of E/F' and Groups(F/F)) 
the set of subgroups of the automorphism group Aut(E£/F'). The Fundamental theorem of 
Galois theory assures: 


Theorem: Fields(E/F’) ae Groups(£/F’) if E/F is Galois. 


The intermediate fields of E'/F are so described by groups. It implies the Abel-Ruffini the- 
orem about the non-solvability of the quintic by radicals. The fundamental theorem demon- 
strates that solvable extensions correspond to solvable groups. The symmetry groups of 
permutations of 5 or more elements are no more solvable. See [631]. 


59. METRIC SPACES 


A topological space (X, Q) is given by a set X and a finite collection O of subsets of X with 
the property that the empty set ( and 2 both belong to O and that O is closed under arbitrary 
unions and finite intersections. The sets in O are called open sets. Metric spaces (X,d) are 
special topological spaces. In that case, O consists of all sets U such that for every « € U there 
exists r > 0 such that the open ball B,(x) = {y € X | d(x, y) < r} is contained in U. Two 
topological spaces (X,O), (Y,Q) are homeomorphic if there exists a bijection f : X > Y, 
such that f and f~' are both continuous. A function f : X — Y is continuous if f~'(A) € O 
for all A € Q. When is a topological space homeomorphic to a metric space? The Urysohn 
metrization theorem gives an answer: we need the regular Hausdorff property meaning 
that a closed set K and a point x can be separated by disjoint neighborhoods kK C U,y € V. 
We also need the space to be second countable meaning that there is a countable topological 
base (a topological base in O is a subset B C O such that every U € O can be written as a 
union of elements in B.) 


Theorem: A second countable regular Hausdorff space is metrizable. 


The result was proven by Pavel Urysohn in 1925 with “regular" replaced by “normal" and by 
Andrey Tychonov in 1926. It follows that a compact Hausdorff space is metrizable if and only 
if it is second countable. For literature, see [84]. 


60. FIXED POINT 


Given a continuous transformation 7 : X — X of a compact topological space X, one 
can look for the fixed point set Fix;(X) = {x | T(x) = x}. This is useful for finding 
periodic points as fixed points of 7” = ToT oT---oT are periodic points of period n. 
If X has a finite cohomology like if X is a compact d-manifold with boundary, one can 
look at the linear map J, induced on the cohomology groups H?(X). The super trace 
x7 (X) = yrF -p(—1)?tr (Tp) is called the Lefschetz number of T on X. If T is the identity, 
this is the Euler characteristic. Let ind7(x) be the Brouwer degree of the map T induced 
on a small (d — 1)-sphere S centered at x. This is the trace of the linear map Tyj_; induced 
from T on the cohomology group H4-1(.$) which is an integer. If T is differentiable and dT (x) 
is invertible, the Brouwer degree is indr(%) = sign(det(dT)). Let Fixr(X) denote the set of 
fixed points of 7’. The Lefschetz-Hopf fixed point theorem is 
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Theorem: If Fix;(X) is finite, then yr(X) = do perixp(x) Mdr(2). 


A special case is the Brouwer fixed point theorem: if X is a compact convex subset of 
Euclidean space. In that case yr(X) = 1 and the theorem assures the existence of a fixed 
point. In particular, if T : D — D is a continuous map from the disc D = {x? + y? < 1} 
onto itself, then T has a fixed point. This Brouwer fixed point theorem was proved in 
1910 by Jacques Hadamard and Luitzen Egbertus Jan Brouwer. The Schauder fixed point 
theorem from 1930 generalizes the result to convex compact subsets of Banach spaces. The 
Lefschetz-Hopf fixed point theorem was given in 1926. For literature, see [I80] [73]. 


61. QUADRATIC RECIPROCITY 


Given a prime p, a number a is called a quadratic residue if there exists a number x such 
that x? has remainder a modulo p. In other words, quadratic residues are the squares in the 
field Z,. The Legendre symbol (a|p) is defined by be 0 if a is 0 or a multiple of p and 1 if 
a is a non-zero residue of p and —1 if it is not. While the integer 0 is sometimes considered 
to be a quadratic residue we don’t include it as it is a special case. Also, in the multiplicative 
group Z), without zero, there is a symmetry: there are the same number of quadratic residues 
and non-residues. This is made more precise in the law of quadratic reciprocity 


Theorem: For any two odd primes (plq)(q|p) = (—1) =~ . 


This means that (p|q) = —(q|p) if and only if both p and q have remainder 3 modulo 4. The 
odd primes with of the form 4k + 3 are also prime in the Gaussian integers. To remember 
the law, one can think of them as “Fermions" and quadratic reciprocity tells they Fermions 
are anti-commuting. The odd primes of the form 4k + 1 factor by the 4-square theorem 
in the Gaussian plane to p = (a+ ib)(a — ib) and are as a product of two Gaussian primes 
and are therefore Bosons. One can remember the rule because Bosons commute both other 
particles so that if either p or q or both are “Bosonic", then (p|q) = (q|p). The law of quadratic 
reciprocity was first conjectured by Euler and Legendre and published by Carl Friedrich Gauss 
in his Disquisitiones Arithmeticae of 1801. (Gauss found the first proof in 1796). [333]. 


62. QUADRATIC MAP 


Every quadratic map z > f(z) = 2? + bz +d in the complex plane is conjugated by a linear 
transformation to one of the quadratic family maps T.(z) = z? +c. The Mandelbrot set 
M ={c€C, T"(0) stays bounded } is also called the connectedness locus of the quadratic 
family because for c € M, the Julia set J. = {z € C; T(z) stays bounded } is connected and 
for c ¢ M, the Julia set J. isa Cantor set. The fundamental theorem for quadratic dynamical 
systems is: 


Theorem: The Mandelbrot set is connected. 


Mandelbrot first thought after doing experiments and picturing the set using a computer and 
printing it out that it was disconnected. The theorem is due to Adrien Duady and John 
Hubbard in 1982. One can also look at the connectedness locus for T(z) = 24 +c, which leads 
to Multibrot sets or the map z + Z+c, which leads to the tricorn or mandelbar which is 
not path connected. One does not know whether the Mandelbrot set M is locally connected, 


nor whether it is path connected. See 110) 
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63. DIFFERENTIAL EQUATIONS 


Let us say that a differential equation x'(t) = F'(x(t)) is integrable if a trajectory x(t) either 
converges to infinity, or to an equilibrium point or to a limit cycle or to a limiting torus, 
where it is a periodic or almost periodic trajectory. We assume that F’ has global solutions 
meaning that a unique solution x(t), t > 0 solving x’ = F(x) exists for all times The Poincaré- 
Bendixon theorem is: 


Theorem: Any differential equation in the plane is integrable. 


This changes in dimensions 3 and higher. The Lorenz attractor or the Réssler attractor are 
examples of strange attractors, limit sets on which the dynamics can have positive topological 
entropy and is therefore no more integrable. The theorem also does not hold any more if R? 
is replaced by the 2-dimensional torus T? because there can be recurrent non-periodic orbits 
and even weak mixing situations can occur generically in smooth situations. The proof of the 
Poincaré-Bendixon theorem relies on the Jordan curve theorem which states that a simple 
closed curve has an interior and exterior in R?. [132] [368]. 


64. APPROXIMATION THEORY 


A function f on a closed interval I = [a,b] is called continuous if for every € > 0 there exists 
ad > 0 such that if |a —y| < 6 then | f(a) — f(y)| < e. In the space X = C(J) of all continuous 
functions, one can define a distance d(f,g) = maxz<;|f(x) — g(x)|. A subset Y of X is called 
dense if for every « > 0 and every x € X, there exists y € Y with d(x, y) < «. Let P denote 
the class of polynomials in X. The Weierstrass approximation theorem tells that 


Theorem: Polynomials P are dense in continuous functions C(J). 


The Weierstrass theorem has been proven in 1885 by Karl Weierstrass. A constructive proof sug- 
gested by Sergey Bernstein in 1912 uses Bernstein polynomials f,(7) = )>j¢_) f(k/n) Br n(2) 
with By»(x) = B(n,k)x*(1—2)"-*, where B(n, k) denote the Binomial coefficients. The result 
has been generalized to compact Hausdorff spaces X and more general subalgebras of C(X). 
The Stone-Weierstrass approximation theorem was proven by Marshall Stone in 1937 
and simplified in 1948 by Stone. In the complex, there is Runge’s theorem from 1885 ap- 
proximating functions holomomorphic on a bounded region G with rational functions uniformly 
on a compact subset K of G and Mergelyan’s theorem from 1951 allowing approximation 
uniformly on a compact subset with polynomials if the region G is simply connected. In nu- 
merical analysis one has the task to approximate a given function space by functions from a 
simpler class. Examples are approximations of smooth functions by polynomials, trigonometric 
polynomials. There is also the interpolation problem of approximating a given data set with 
polynomials or piecewise polynomials like splines or Bézier curves. See (510). 


65. DIOPHANTINE APPROXIMATION 


An algebraic number is a root of a polynomial p(x) = anv" + dna” 1 +++++ 4,2 + ao with 
integer coefficients a,. A real number z is called Diophantine if there exists « > 0 anda 
positive constant C' such that the Diophantine condition |x — p/q| > C/q?** is satisfied for 
all p, and all g > 0. Thue-Siegel-Roth theorem tells: 


at 
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Theorem: Any irrational algebraic number is Diophantine. 


The Hurwitz’s theorem from 1891 assures that there are infinitely many p,q with |x—p/q| < 
C/q? for C = 1/5. This shows that the Tue-Siegel-Roth Theorem can not be extended to 
¢ = 0. The Hurwitz constant C is optimal. For any C < 1/,/5 one can with the golden ratio 
a = (1+ V5)/2 have only finitely many p,q with |x — p/q| < C/¢’. The set of Diophantine 
numbers has full Lebesgue measure. A slightly larger set is the Brjuno set of all numbers 
for which the continued fraction convergent p,,/q;, satisfies 7, log(@n41)/dn < co. A Brjuno 
rotation number assures the Siegel linearization theorem still can be proven. For quadratic 
polynomials, Jean-Christophe Yoccoz showed that linearizability implies the rotation number 


must be a Brjuno number. [110 


66. ALMOST PERIODICITY 


If 4 is a probability measure of compact support on R, then fi, = f e’"* du(x) are the 
Fourier coefficients of 4. The Riemann-Lebesgue lemma tells that if w is absolutely 
continuous, then fi, goes to zero. The pure point part can be detected with the following 
Wiener theorem: 


Theorem: lim,-,.0 = ener lel? = ver HA 2}))?. 


This looks a bit like the Poisson summation formula )°>, f(n) = >, f(n), where f is the 
Fourier transform of f. [The later follows from >, e?"** = S> d(x —n), where 6(x) is a Dirac 


delta function. The Poisson formula holds if f is uniformly continuous and if both f and f 
satisfy the growth condition |f(x)| < C/|1+ |z||'**. | More generally, one can read off the 
Hausdorff dimension from decay rates of the Fourier coefficients. See [3'76} [629]. 


67. SHADOWING 


Let T' be a diffeomorphism on a smooth Riemannian manifold M with geodesic metric 
d. A T-invariant set is called hyperbolic if for each x € K, the tangent space T,,M splits 
into a stable and unstable bundle E* 6 E, such that for some 0 < \ < 1 and constant C, 
one has dTE* = Ez, and |dT+"v| < Cr” for v € E* and n > 0. An e-orbit is a sequence 
Zp of points in M such that x,1, € B.(T(xn)), where B, is the geodesic ball of radius «. Two 
sequences Zp, Yn € M are called d-close if d(y,,%,) <6 for all n. We say that a set AK has the 
shadowing property, if there exists an open neighborhood U of K such that for all 6 > 0 


there exists € > 0 such that every e-pseudo orbit of J’ in U is 6-close to true orbit of 7’. 


Theorem: Every hyperbolic set has the shadowing property. 


This is only interesting for infinite K as if K is a finite periodic hyperbolic orbit, then the orbit 
itself is the orbit. It is interesting however for a hyperbolic invariant set like a Smale horse 
shoe or in the Anosov case, which is the situation when the entire manifold is hyperbolic. 


See [368]. 


68. PARTITION FUNCTION 


Let p(n) denote the number of ways we can write n as a sum of positive integers without 
distinguishing the order. For example, p(4) = 5 because 4 = 14+3=2+4+2=14+1+4+2= 
1+1+1+41 can be written in 4 different ways as a sum of positive integers. Euler used 
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its generating function which is )>~, p(n)a” = []7,(1 — x*)~*. The reciprocal function 


(1 — x)(1 — 2) + (1 — 2?)--- is called the Euler function and generates the generalized 
Pentagonal number theorem )>,.,(—1)*a®G!-V? = 1-2-2? +a2°-27-a2? zh... 
leading to the recursion p(n) = p(n—1)+p(n—2) —p(n—5) —p(n—7)+p(n—12)+p(n—-15)---. 
The Jacobi triple product identity is 


Theorem: [7,0 =2°0 =e le Se 


The formula was found in 1829 by Jacobi. For 2 = z,/z and y? = —\/z the identity reduces to 
the pentagonal number theorem of Euler. See [19]. 


69. BURNSIDE LEMMA 


If G is a finite group acting on a finite set X, let X/G denote the number of disjoint orbits 
and X9 = {x € X | g.« = 2,Vg € G} the fixed point set of elements which are fixed by g. 
The number |X/G| of orbits and the group order |G| and the size of the fixed point sets 
are related by the Burnside lemma: 


Theorem: |X/G| = 1a] eee Le 


The result was first proven by Frobenius in 1887. Burnside popularized it in 1897 [LOI]. 


70. TAYLOR SERIES 


A complex-valued function f which is analytic in a disc D = D,(a) = {|x — a] < r} can be 
written as a series involving the n’th derivatives f) (a) of f at a. If f is real valued on the real 
axes, the function is called real analytic in (~ — a,x +a). In several dimensions we can use 
multi-index notation @ = (aj,...,0q); 7% = Giny..<pfg); © = (hys..5 04) and 2” = apa? 
and f(x) = Om! ..-0"4 and use a polydisc D = D,(a) = {|r1 — ai| <1r1,...|a — aal < ra}. 
The Taylor series formula is: 


Theorem: For analytic f in D, f(x) = >>>. "O(a —a)”. 
Here, T,(a) = {|z; — ai] = 71...|%q — aa| = ra} is the boundary torus. For example, for 
f(x) = exp(x), where f‘”)(0) = 1, one has f(x) = 0°, 2”/n!. Using the differential op- 
erator Df(x) = f'(x), one can see f(x +t) = D>, fen = e’ f(x) as a solution of 
the transport equation f; = Df. One can also represent f as a Cauchy formula for 
polydiscs 1/(27i)4 Siete) f(z)/(z — a)4dz integrating along the boundary torus. Finite Taylor 
series hold in the case if f is m-+ 1 times differentiable. In that case one has a finite series 


Sig) => re (x — a)” such that the Lagrange rest term is f(x) — S(a) = R(x) = 
f™(€)(a@ — a)™*"/((m + 1)!), where € is between x and a. This generalizes the mean value 
theorem in the case m = 0, where f is only differentiable. The remainder term can also be 
written as [” f("*)(s)(2 — a)™/m! ds. Brook Taylor did state but not justify the formula in 


1715. In 1742 Colin Maclaurin uses the modern form. [419]. 
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71. ISOPERIMETRIC INEQUALITY 


Given a smooth surface S' in R"” homeomorphic to a sphere and bounding a region B. Assume 
that the surface area |S| is fixed. How large can the volume |B| of B become? If B is the 
unit ball B, with volume |B,| the answer is given by the isoperimetric inequality: 


Theorem: n"|B|"~! < |S|"/|Bi|. 


If B = By, this gives n|B] < |S|, which is an equality as then the volume of the ball |B] = 
x”/2/T'(n/2+1) and the surface area of the sphere |$| = na"/?/T'(n/2+1) which Archimedes 
first got in the case n = 3, where |S| = 47 and |B| = 47/3. The classical isoperimetric 
problem is n = 2, where we are in the plane R?. The inequality tells then 4|B| < |S|?/7 
which means 47Area < Length?. The ball B, with area 1 maximizes the functional. For 
n = 3, with usual Euclidean space R?, the inequality tells |B|? < (47)?/(27 - 42/3) which is 
|B| < 47/3. The first proof in the case n = 2 was attempted by Jakob Steiner in 1838 using 
the Steiner symmetrization process which is a refinement of the Archimedes-Cavalieri 
principle. In 1902 a proof by Hurwitz was given using Fourier series. The result has been 
extended to geometric measure theory [218]. One can also look at the discrete problem to 
maximize the area defined by a polygon: if {(x;,y;),7 = 0,...n — 1} are the points of the 
polygon, then the area is given by Green’s formula as A = yo. LiYir1 — Vi41y; and the length 
ie = y . (a; — Gigi)? + (Yi — Yio)? With (@n, Yn) identified with (xo, yo). The Lagrange 
equations for A under the constraint L = 1 together with a fix of (xo, yo) and (a; = 1/n,0) 
produces two maxima which are both regular polygons. A generalization to n-dimensional 
Riemannian manifolds is given by the Lévi-Gromov isoperimetric inequality. 


72. RIEMANN ROCH 


A Riemann surface is a one-dimensional complex manifold. It is a two-dimensional real 
analytic manifold but it has also a complex structure forcing it to be orientable for example. 
Let G be a compact connected Riemann surface of Euler characteristic y(G) = 1 — g, where 
g = b,(G) is the genus, the number of handles of G (and 1 = bo(G) indicates that we have 
only one connected component). A divisor D = 5°, a;z; on G is an element of the free Abelian 
group on the points of the surface. These are finite formal sums of points z; in G, where a; € Z 
is the multiplicity of the point z;. The degree of the divisor is defined as deg(D) = 5°, a;. Let 
us write y(D) = deg(D) + x(G) = deg(D) + 1 — g and call this the Euler characteristic of 
the divisor D as one can see a divisor as a geometric object by itself generalizing the complex 
manifold X (which is the case D = 0). A meromorphic function f on G defines the 
principal divisor (f) = )0, a;z; — SF b;w;, where a; are the multiplicities of the roots z; of 
f and 6; the multiplicities of the poles w, of f. The principal divisor of a global meromorphic 
1-form dz which is called the canonical divisor Kk. Let /(D) be the dimension of the linear 
space of meromorphic functions f on G for which (f) + D > 0. (The notation > 0 means that 
all coefficients are non-negative. One calls such a divisor effective). The Riemann-Roch 
theorem is 


Theorem: /(D) —1(kK — D) = x(D) 


The idea of a Riemann surfaces was defined by Bernhard Riemann. Riemann-Roch was proven 
for Riemann surfaces by Bernhard Riemann in 1857 and Gustav Roch in 1865. It is possible 
to see this as a Euler-Poincaré type relation by identifying the left hand side as a signed 
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cohomological Euler characteristic and the right hand side as a combinatorial Euler character- 
istic. There are various generalizations, to arithmetic geometry or to higher dimensions. See 


271 BSG]. 


73. OPTIMAL TRANSPORT 


Given two probability spaces (X, P), (Y, Q) and a continuous cost function c: X x Y — {0, oo], 
the optimal transport problem or Monge-Kantorovich minimization problem is to find 
the minimum of [, c(a,T(«)) dP(«) among all coupling transformations T : X — Y which 
have the property that it transports the measure P to the measure Q@. More generally, one 
looks at a measure 7 on X x Y such that the projection of 7 onto X it is P and the projection 
of x onto Y is Q. The function to optimize is then I() = J, c(x,y) dr(z,y). One of the 
fundamental results is that optimal transport exists. The technical assumption is that if the 
two probability spaces X,Y are Polish (=separable complete metric spaces) and that the cost 
function c is continuous. 


Theorem: For continuous cost functions c, there exists a minimum of J. 


In the simple set-up of probability spaces, this just follows from the compactness (the Alaoglu 
theorem for balls in the weak star topology of a Banach space) of the set of probability measures: 
any sequence 7,, of probability measures on X x Y has a convergent subsequence. Since I is 
continuous, picking a sequence 7, with I(z,,) decreasing produces to a minimum. The problem 
was formalized in 1781 by Gaspard Monge and worked on by Leonid Kantorovich. Hirisho 
Tanaka in the 1970ies produced connections with partial differential equations like the Bolzmann 
equation. There are also connections to weak KAM theory in the form of Aubry-Mather 
theory. The above existence result is true under substantial less regularity. The question of 
uniqueness or the existence of a Monge coupling given in the form of a transformation T is 


subtle [672]. 


74. STRUCTURE FROM MOTION 


Given m hyper planes in R? serving as retinas or photographic plates for affine cameras and 
n points in R¢. The affine structure from motion problem is to understand under which 
conditions it is possible to recover both the points and planes when knowing the orthogonal 
projections onto the planes. It is a model problem for the task to reconstruct both the scene 
as well as the camera positions if the scene has n points and m camera pictures were taken. 
Ullman’s theorem is a prototype result with n = 3 different cameras and m = 3 points which are 
not collinear. Other setups are perspective cameras or omni-directional cameras. The 
Ullman map F is a nonlinear map from R¢? x SO? to (R343)? which is a map between equal 
dimensional spaces if d = 2 and d = 3. The group SOz is the rotation group in R describing the 
possible ways in which the affine camera can be positioned. Affine cameras capture the same 
picture when translated so that the planes can all go through the origin. In the case d = 2, we 
get a map from R* x SO3 to R® and in the case d = 3, F maps R® x SO} into R”. 


Theorem: The structure from motion map is locally invertible. 


In the case d = 2, there is a reflection ambiguity. In dimension d = 3, the number of ambiguities 
is typically 64. Ullman’s theorem appeared in 1979 in [664]. Ullman states the theorem for d=3 
with 4 points as adding a four point cuts the number of ambiguities from 64 to 2. See [403 
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both in dimension d=2 and d=3 the Jacobean dF of the Ullman map is seen to be invertible 
and the inverse of F' is given explicitly. For structure from motion problems in computer vision 
in general, see [659]. In applications one takes n and m large and reconstructs both 
the points as well as the camera parameters using statistical data fitting. 


75. POISSON EQUATION 


What functions u solve the Poisson equation —Au = : a oe ne equation? The 
right hand side can be written down for f € L' as K s(x =f aiGl (y) dy + h, where h is 
harmonic. If f = 0, then the Poisson equation is the im naire The function G(z, y) 
is the Green’s function, an integral kernel. It satisfies -AG(x, y) = 6(y—2), where 6 is the 
Dirac delta function, a distribution. It is given by G(x, y) = — log |x — y|/(27) for n = 2 or 
G(2,y) = |x —y|~1/(4r) for n = 3. In elliptic regularity theory, one replaces the Laplacian 
—A with an elliptic second order differential operator L = A(x)-D-D+b(x)-D+V(za) 
where D = V is the gradient and A is a positive definite matrix, b(x) is a vector field and c is 
a scalar field. 


Theorem: For f € L? and p>n, then Ky is differentiable. 


The result is much more general and can be extended. If f is in C* and has compact support 
for example, then Ky is in C**', An example of the more general set up is the Schrédinger 
operator L = —A+ V(x) — E. The solution to Lu = 0, solves then an eigenvalue problem. 
As one looks for solutions in L?, the solution only exists if E is an eigenvalue of L. The 
Euclidean space R” can be replaced by a bounded domain ( of R” where one can look at 
boundary conditions like of Dirichlet or von Neumann type. Or one can look at the situation 
on a general Riemannian manifold M with or without boundary. On a Hilbert space, one has 
then Fredholm theory. The equation u = [ G(a,y)f(y)dy is called a Fredholm integral 
equation and det(1 — sG) = exp(— >>, s"tr(G")/n!) the Fredholm determinant leading to 
the zeta function 1/det(1 — sG). See [557] [447]. 


76. FOUR SQUARE THEOREM 


Waring’s problem asked whether there exists for every k an fae g(k) such that every 
positive integer can be written as a sum of g(k) powers xf +--+. + te): Obviously g(1) = 1. 
David Hilbert proved in 1909, that g(k) is finite. This is the Hilbert-Waring theorem. The 
following theorem of Lagrange tells that g(2) = 4: 


Theorem: Every positive integer is a sum of four squares 


The result needs only to be verified for prime numbers as N(a,b,c,d) = a* + b? +c? + d? 
is a norm for quaternions q = (a,b,c,d) which has the property N(pq) = N(p)N(q). This 
property can be seen also as a Cauchy-Binet formula, when writing quaternions as complex 
2 x 2 matrices. The four-square theorem had been conjectured already by Diophantus, but 
was proven first by Lagrange in 1770. The case g(3) = 9 was done by Wieferich in 1912. It is 
conjectured that g(k) = 2* + [(3/2)*] — 2, where [z] is the integral part of a real number. See 


166] [167] (333). 
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77. KNOTS 


A knot is a closed curve in R®, an embedding of the circle in three dimensional Euclidean 
space. One also draws knots in the 3-sphere S?. As the knot complement $° — K of a knot 
kK characterizes the knot up to mirror reflection, the theory of knots is part of 3-manifold 
theory. The HOMFLYPT polynomial P of a knot or link K is defined recursively using 
skein relations /P(L,) +1~'P(L~) +mP(Lp) = 0. Let K#L denote the knot sum which 
is a connected sum. Oriented knots form with this operation a commutative monoid with 
unknot as unit. It features a unique prime factorization. The unknot has P(k) = 1, the 
unlink has P(K) = 0. The trefoil knot has P(K) = 2/7 — I* + ?m?. 


Theorem: P(K#L) = P(K)P(L). 


The Alexander polynomial was discovered in 1928 and initiated classical knot theory. John 
Conway showed in the 60ies how to compute the Alexander polynomial using a recursive skein 
relations (skein comes from French escaigne=hank of yarn). The Alexander polynomial allows 
to compute an invariant for knots by looking at the projection. The Jones polynomial found 
by Vaughan Jones came in 1984. This is generalized by the HOMFLYPT polynomial named 
after Jim Hoste, Adrian Ocneanu, Kenneth Millett, Peter J. Freyd and W.B.R. Lickorish from 
1985 and J. Przytycki and P. Traczyk from 1987. See [5]. Further invariants are Vassiliev 
invariants of 1990 and Kontsevich invariants of 1993. 


78. HAMILTONIAN DYNAMICS 


Given a probability space (M,A,m) and a smooth Lie manifold N with potential function 
V:N-—-R, the Vlasov Hamiltonian differential equations on all maps X = (f,g) : M—> 
T*N is f' = 9,9 = Jy VV (f(z) — fly)) dm(y). Starting with Xo = Id, we get a flow X; 
and by push forward an evolution P' = X;‘m of probability measures on N. The Vlasov intro- 
differential equations on measures in T*N are P*(x,y) + y+ V2P*"(x, y) —W(x)-VyP'(x,y) =0 
with W(x) = f,, V2V (a — 2')P'(a2',y')) dy'dz'. Note that while X; is an infinite dimensional 
ordinary differential equations evolving maps M — TJ*N, the path P’ is an integro 
differential equation describing the evolution of measures on 7*N. 


Theorem: If X; solves the Vlasov Hamiltonian, then P* = X;m solves Vlasov. 


This is a result which goes back to James Clerk Maxwell. Vlasov dynamics was introduced in 
1938 by Anatoly Vlasov. An existence result was proven by W. Brown and Klaus Hepp in 1977. 
The maps X; will stay perfectly smooth if smooth initially. However, even if P° is smooth, 
the measure P' in general rather quickly develops singularities so that the partial differential 
equation has only weak solutions. The analysis of P directly would involve complicated 
function spaces. The fundamental theorem of Vlasov dynamics therefore plays the role 
of the method of characteristics in this field. If MW is a finite probability space, then the 
Vlasov Hamiltonian system is the Hamiltonian n-body problem on N. An other example is 
M =T*N and where m is an initial phase space measure. Now X; is a one parameter family of 
diffeomorphisms X;:M— T*N pushing forward m to a measure P‘ on the cotangent bundle. 
If M is acircle then X° defines a closed curve on T*N. In particular, if y(¢) is a curve in N and 
X°(t) = (y(t), 0), we have a continuum of particles initially at rest which evolve by interacting 
with a force VV. About interacting particle dynamics, see [620]. 
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79. HyPERCOMPLEXITY 


A hypercomplex algebra is a finite dimensional algebra over R which is unital and dis- 
tributive. The classification of hypercomplex algebras (up to isomorphism) of two-dimensional 
hypercomplex algebras over the reals are the complex numbers x + iy with i? = —1, the 
split complex numbers x + jy with 7? = —1 and the dual numbers (the exterior algebra) 
x+ey with ¢ =0. A division algebra over a field F is an algebra over F' in which division 
is possible. Wedderburn’s little theorem tells that a finite division algebra must be a finite 
field. Only C is the only two dimensional division algebra over R. The following theorem of 
Frobenius classifies the class ¥ of finite dimensional associative division algebras over R: 


Theorem: 4% consists of the algebras R,C and H. 


Hypercomplex numbers like quaternions, tessarines or octonions extend the algebra of 
complex numbers. Cataloging them started with Benjamin Peirce 1872 "Linear associative 
algebra". Dual numbers were introduced in 1873 by William Clifford. The Cayley-Dickson 
constructions generates iteratively algebras of twice the dimensions: like the complex numbers 
from the reals, the quaternions from the complex numbers or the octonions from the quaternions 
(for octonions associativity is lost). The next step leads to sedenions but the later are not 
even an alternative algebra any more. The Hurwitz and Frobenius theorems limit the number 
in the case of real normed division algebras. Ferdinand George Frobenius classified in 1877 the 
finite-dimensional associative division algebras. Adolf Hurwitz proved in 1923 (posthumously) 
that unital finite dimensional real algebra endowed with a positive-definite quadratic form (a 
real normed division algebra must be R,C,H or ©). These four are the only Euclidean 
Hurwitz algebras. In 1907, Joseph Wedderburn classified simple algebras (simple meaning 
that there are no non-trivial two-sided ideals and so that ab = 0 implies a = 0 or b = 0). In 1958 
J. Frank Adams showed topologically that R, C, H, O are the only finite dimensional real division 
algebras. In general, division algebras have dimension 1, 2,4 or 8 as Michel Kervaire and Raoul 
Bott and John Milnor have shown in 1958 by relating the problem to the parallelizability of 
spheres. The problem of classification of division algebras over a field F' led Richard Brauer 
to the Brauer group BR(F), which Jean Pierre Serre identified it with Galois cohomology 
H?(K, K*), where K* is the multiplicative group of K seen as an algebraic group. Each Brauer 
equivalence class among central simple algebras (Brauer algebras) contains a unique division 
algebra by the Artin-Wedderburn theorem. Examples: the Brauer group of an algebraically 
closed field or finite field is trivial, the Brauer group of R is Z. Brauer groups were later 
defined for commutative rings by Maurice Auslander and Oscar Goldman and by Alexander 
Grothendieck in 1968 for schemes. Ofer Gabber extended the Serre result to schemes with 
ample line bundles. The finiteness of the Brauer group of a proper integral scheme is open. See 


[35] DIB). 


80. APPROXIMATION 


The Kolmogorov-Arnold superposition theorem shows that continuous functions C'(R”) 
of several variables can be written as a composition of continuous functions of two variables: 


Theorem: Every f € C(R") composition of continuous functions in C(R?). 


More precisely, it is now known since 1962 that there exist functions f;,; and a function g 
in C(R) such that f(21,..-,¢n) = "9 9(fea(x1) + «+++ frmtn). As one can write finite 
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sums using functions of two variables like h(z,y) = x+y or h(x+y,z) = x+y+ 2 two 
variables suffice. The above form was given by by George Lorentz in 1962. Andrei Kolmogorov 
reduced the problem in 1956 to functions of three variables. Vladimir Arnold showed then (as a 
student at Moscow State university) in 1957 that one can do with two variables. The problem 
came from a more specific problem in algebra, the problem of finding roots of a polynomial 
p(x) = 2" + aya"! +---a, using radicals and arithmetic operations in the coefficients is not 
possible in general for n > 5. Erland Samuel Bring shows in 1786 that a quintic can be reduced 
to «° + ax +1. In 1836 William Rowan Hamilton showed that the sextic can be reduced to 
x® + az? + be + 1 to x’ + ax? + bx? + cr +1 and the degree 8 to a 4 parameter problem 
x + ax* + bax? + cx? +dxr +1. Hilbert conjectured that one can not do better. They are the 
Hilbert’s 13th problem, the sextic conjecture and octic conjecture. In 1957, Arnold and 
Kolmogorov showed that no topological obstructions exist to reduce the number of variables. 
Important progress was done in 1975 by Richard Brauer. Some history is given in [216]: 


81. DETERMINANTS 


The determinant of a n x n matrix A is defined as the sum > (1) Areca) ee A Gils 
where the sum is over all n! permutations 7 of {1,...,n} and sign(z) is the signature of 
the permutation 7. The determinant functional satisfies the product formula det(AB) = 
det(A)det(B). As the determinant is the constant coefficient of the characteristic polyno- 
mial p4(x) = det(A — 21) = po(—2x)” + py(—x)""1 + +++ + pp(—x)"-* +-++++ py of A, one can 
get the coefficients of the product F7G of two n x m matrices F,G as follows: 


Theorem: py = )/)p)-; det( Fp) det(Gp). 


The right hand side is a sum over all minors of length k& including the empty one |P| = 
0, where det(Fp) det(Gp) = 1. This implies det(1 + F7G) = )>pdet(Fp) det(Gp) and so 
det(1 + F?F) = >>,det?(Fp). The classical Cauchy-Binet theorem is the special case k = m, 
where det(F7G) = )>, det(Fp)det(Gp) is a sum over all mxm patterns ifn > m. It has as even 
more special case the Pythagorean consequence det(A’A) = )>,det(A%). The determinant 
product formula is the even more special case when n = m. [318]. 


82. ‘TRIANGLES 


A triangle T on a two-dimensional surface S is defined by three points A, B, C joined by three 
geodesic paths. (It is assumed that the three geodesic paths have no self-intersections nor other 
intersections besides A, B, C'so that T is a topological disk with a piecewise geodesic boundary). 
If a, 3,y are the inner angles of a triangle T located on a surface with curvature K, there 
is the Gauss-Bonnet formula {, K(x)dA(x) = x(S), where dA denotes the area element on 
the surface. This implies a relation between the integral of the curvature over the triangle and 
the angles: 


Theorem: a+6+y7=f,K dA+na 


This can be seen as a special Gauss-Bonnet result for Riemannian manifolds with bound- 
ary as it is equivalent to [,, Kk dA+a'+ 6+ 7/ = 2 with complementary angles a’ = 
7t—a, Bb’ = 7 — 6,7 =a —7. One can think of the vertex contributions as boundary cur- 
vatures (generalized function). In the case of constant curvature K, the formula becomes 
a+ 68+7= KA+z7, where A is the area of the triangle. Since antiquity, one knows the 
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flat case K = 0, where 7 = a+ (+7 taught in elementary school. On the unit sphere 
this isa+@6+7= A-+rq, result of Albert Girard which was predated by Thomas Harriot. 
In the Poincaré disk model K = —1, this is a+ 6+7y = —A+q7 which is usually stated 
that the area of a triangle in the disk is 7 -a—(—vy. This was proven by Johann Heinrich 
Lambert. See [94] for spherical geometry and [18] for hyperbolic geometry, which are both part 
of non-Euclidean geometry and now part of Riemannian geometry. [54] [358] 


83. KAM 


An area preserving map T(x, y) = (21 —y+cf(«), x) has an orbit (a41,2,) on T? = (R/Z)? 
which satisfies the recursion 2,41 — 2% + %p—1 = Cf(%pn). The 1-periodic function f is assumed 
to be real-analytic, non-constant satisfying ie f(a) dx = 0. In the case f(a) = sin(27x), one 
has the Standard map. When looking for invariant curves (q(t + a), ¢(t)) with smooth q, we 
seek a solution of the nonlinear equation F'(q) = q(t+ a) — 2q(t) + ¢(t-— a) —cf(q(t)) =0. For 
c = 0, there is the solution q(t) = t. The linearization dF(q)(u) = Lu = u(t + a) — 2u(t) + 
u(t — a) — cf’(q(t))u(t) is a bounded linear operator on L?(T) but not invertible for c = 0 so 
that the implicit function theorem does not apply. The map Lu = u(t+a) —2u(t) +u(t—a) 
becomes after a Fourier transform the diagonal matrix Lit, = [2 cos(na) — 2%, which has the 
inverse diagonal entries [2 cos(na) — n]~' leading to small divisors. A real number a is called 
Diophantine if there exists a constant C’ such that for all integers p,q with q 4 0, we have 
la—p/q| > C/q?. KAM theory assures that the solution q(t) = t persists and remains smooth 
if c is small. With solution the theorem means a smooth solution. For real analytic F’, it 
can be real analytic. The following result is a special case of the twist map theorem. 


Theorem: For Diophantine a, there is a solution of F'(q) = 0 for small |c|. 


The KAM theorem was predated by the Poincaré-Siegel theorem in complex dynamics 
which assured that if f is analytic near z = 0 and f’(0) = A = exp(2mia) with Diophantine 
a, then there exists u(z) = z+ ¢(z) such that f(u(z)) = u(Az) holds in a small disk 0: there 
is an analytic solution q to the Schréder equation Az + g(z + ¢q(z)) = q(Az). The question 
about the existence of invariant curves is important as it determines the stability. The twist 
map theorem result follows also from a strong implicit function theorem initiated by John 
Nash and Jiirgen Moser. For larger c, or non-Diophantine a, the solution q still exists but it is 
no more continuous. This is Aubry-Mather theory. For c 4 0, the operator L is an almost 
periodic Toeplitz matrix on /?(Z) which is a special kind of discrete Schrédinger operator. 
The decay rate of the off diagonals depends on the smoothness of f. Getting control of the 
inverse can be technical [79]. Even in the Standard map case f(x) = sin(x), the composition 
f(q(t)) is no more a trigonometric polynomial so that L appearing here is not a Jacobi matrix 
in a strip. The first breakthrough of the theorem in a frame work of Hamiltonian differential 
equations was done in 1954 by Andrey Kolmogorov. Jiirgen Moser proved the discrete twist 
map version and Vladimir Arnold in 1963 proved the theorem for Hamiltonian systems. The 
above stated result generalizes to higher dimensions where one looks for invariant tori called 
KAM tori. one needs some non-degeneracy conditions See [503]. For the story on 
KAM, see [186]. 
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84. CONTINUED FRACTION 


Given a positive square free integer d, the Diophantine equation x?—dy? = 1 is called Pell’s 
equation. Solving it means to find a nontrivial unit in the ring Z[Vd] because (x + yVd) (x — 
yVd) = 1. The trivial solutions are « = +1, y = 0. Solving the equation is therefore part of the 
Dirichlet unit problem from algebraic number theory. Let [a9; a1,...] denote the continued 
fraction expansion of x = Vd. This means ao = [2] is the integer part and [1/(a — ag)] = ay 
etc. If = [a9;@1,...,@n + dn], then @n41 = [1/b,]. Let pn/dn = [a@o; 1, @2,...,@n] denote the 
n’th convergent to the regular continued fraction of Vd. A solution (21, y1) which minimizes 
x is called the fundamental solution. The theorem tells that it is of the form (pp, dn): 


Theorem: Any solution to the Pell’s equation is a convergent pp /Qn- 


One can find more solutions recursively because the ring of units in Z[Vd] is Zz x C;, for some 
cyclic group C,,. The other solutions (x, y,) can be obtained from x, + Vdyx = (ay + Vdy,)*. 
One of the first instances, where the equation appeared is in the Archimedes cattle problem 
which is x? — 410286423278424y? = 1. The equation is named after John Pell, who has nothing 
to do with the equation. It was Euler who attributed the solution by mistake to Pell. It was 
first found by William Brouncker. The approach through continued fractions started with Euler 


and Lagrange. See [562) [88} [444]. 
85. GAUSS-BONNET-CHERN 


Let (M,g) be a Riemannian manifold of dimension d with volume element dy. If 
Ri, is Riemann curvature tensor with respect to the metric g, define the constant C = 


= : : a(1)a(2 a(d—1)o(d 
((4ar)4/? (—2)4/?(d/2)!)~! and the curvature K(x) =C>,, sign(o)sign()R213}°(5} - cease 
where the sum is over all permutations 7,0 of {1,...,d}. It can be interpreted as a Pfaffian. 


In odd dimensions, the curvature is zero. Denote by y(M) the Euler characteristic of M. 
Theorem: |,, K(x) du(z) = 2my(M). 


The case d = 2 was solved by Carl Friedrich Gauss and by Pierre Ossian Bonnet in 1848. Gauss 
knew the theorem but never published it. In the case d = 2, the curvature Kk is the Gaussian 
curvature which is the product of the principal curvatures #1, «2 at a point. For a sphere 
of radius R for example, the Gauss curvature is 1/R? and y(M) = 2. The volume form is 
then the usual area element normalized so that [,,1 d(x) = 1. Allendoerfer-Weil in 1943 
gave the first proof, based on previous work of Allendoerfer, Fenchel and Weil. Chern finally, in 
1944 proved the theorem independent of an embedding. features a proof of Vijay Kumar 
Patodi. A more classical approach is in in |660). 


86. ATIYAH-SINGER 


Assume M is a compact orientable finite dimensional manifold of dimension n and assume D 
is an elliptic differential operator D : E — F between two smooth vector bundles EF, F' 
over M. Using multi-index notation D* = 0% .-- 0%, a differential operator )>, a;,(2)D*x 
is called elliptic if for all x, its symbol the polynomial o(D)(y) = Yiigjen az(x)y* is not zero 
for nonzero y. Elliptic regularity assures that both the kernel of D and the kernel of the 
adjoint D* : F > E are both finite dimensional. The analytical index of D is defined as 
x(D) = dim(ker(D)) — dim(ker(D*)). We think of it as the Euler characteristic of D. The 
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topological index of D is defined as the integral of the n-form Kp = (—1)"ch(a(D))-td(7), 
over M. This n-form is the cup product - of the Chern character ch(a(D)) and the Todd 
class of the complexified tangent bundle TM of M. We think about Kp as a curvature. 
Integration is done over the fundamental class |M] of M which is the natural volume form 
on M. The Chern character and the Todd classes are both mixed rational cohomology classes. 
On a complex vector bundle EF they are both given by concrete power series of Chern classes 
cr(E) like ch(E) = e7) 4... +e) and td(E) = a,(1 + e~™)7!---an(1 + e7™) 71 with 
a; = (1 (L;) if FE = Ll, @--- @ Ly is a direct sum of line bundles. 


Theorem: The analytic index and topological indices agree: y(D) = J,, Kp. 


In the case when D = d+d* from the vector bundle of even forms F to the vector bundle of odd 
forms F’, then Kp is the Gauss-Bonnet curvature and y(D) = x(M). Israil Gelfand conjectured 
around 1960 that the analytical index should have a topological description. The Atiyah-Singer 
index theorem has been proven in 1963 by Michael Atiyah and Isadore Singer. The result 
generalizes the Gauss-Bonnet-Chern and Riemann-Roch-Hirzebruch theorem. According to 
566], “the theorem is valuable, because it connects analysis and topology in a beautiful and 


insightful way". See [528}. 


87. COMPLEX MULTIPLICATION 


A n’th root of unity is a solution to the equation z” = 1 in the complex plane C. It is called 
primitive if it is not a solution to z* = 1 for some 1 < k < n. A cyclotomic field is a 
number field Q(¢,) which is obtained by adjoining a complex primitive root of unity ¢,, to 
Q. Every cyclotomic field is an Abelian field extension of the field of rational numbers Q. The 
Kronecker-Weber theorem reverses this. It is also called the main theorem of class field 
theory over Q 


Theorem: Every Abelian extension L/Q is a subfield of a cyclotomic field. 


Abelian field extensions of Q are also called class fields. It follows that any algebraic number 
field K/Q with Abelian Galois group has a conductor, the smallest n such that K lies in 
the field generated by n’th roots of unity. Extending this theorem to other base number fields is 
Kronecker’s Jugendtraum or Hilbert’s twelfth problem. The theory of complex mul- 
tiplication does the generalization for imaginary quadratic fields. The theorem was stated 
by Leopold Kronecker in 1853 and proven by Heinrich Martin Weber in 1886. A generalization 
to local fields was done by Jonathan Lubin and John Tate in 1965 and 1966. (A local field is 
a locally compact topological field with respect to some non-discrete topology. The list of local 
fields is R, C, field extensions of the p-adic numbers Q,,, or formal Laurent series F,,((t)) over 
a finite field Fj.) The study of cyclotomic fields came from elementary geometric problems 
like the construction of a regular n-gon with ruler and compass. Gauss constructed a regular 
17-gon and showed that a regular n-gon can be constructed if and only if n is a Fermat 
prime F,, = 2?" +1 (the known ones are 3, 7, 17, 257, 65537 and a problem of Eisenstein of 1844 
asks whether there are infinitely many). Further interest came in the context of Fermat’s last 
theorem because x” + y” = 2” can be written as 2° + y” = (xt+y)(xtCy)---(x + ¢" Fy), 
where ¢ is an n’th root of unity for n > 2. 
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88. CHOQUET THEORY 


Let K be a compact and convex set in a Banach space X. A point x € K is called extreme 
if x is not in an open interval (a,b) with a,b € kK. Let E be the set of extreme points in K. The 
Krein-Milman theorem, proven in 1940 by Mark Krein and David Milman, assures that 
is the convex hull of E. Given a probability measure ys on E, it defines the point « = [ ydu(y). 
We say that x is the Barycenter of 4. The Choquet theorem is 


Theorem: Every point in K is a Barycenter of its extreme points. 


This result of Choquet implies the Krein-Milman theorem. It generalizes to locally compact 
topological spaces. The measure jz is not unique in general. It is in finite dimensions if K is 
a simplex. But in general, as shown by Heinz Bauer in 1961, for an extreme point x € K the 
measure [Jz is unique. It has been proven by Gustave Choquet in 1956 and was generalized 
by Erret Bishop and Karl de Leeuw in 1959. 


89. HELLY’S THEOREM 


Given a family K = {K,,...K,,} of convex sets Ky, Ko,...,K;, in the Euclidean space R?@ 
and assume that n > d. Let K,, denote the set of subsets of K which have exactly m elements. 
We say that K,, has the intersection property if every of its elements has a non-empty 
common intersection. The theorem of Helly assures that 


Theorem: KX, has the intersection property if Kg,1 has. 


The theorem was proven in 1913 by Eduard Helly. It generalizes to an infinite collection 
of compact, convex subsets. This theorem led Johann Radon to prove in 1921 the Radon 
theorem which states that any set of d+ 2 points in R¢ can be partitioned into two disjoint 
subsets whose convex hull intersect. A nice application of Radon’s theorem is the Borsuk- 
Ulam theorem which states that a continuous function f from the d-dimensional sphere S” 
to R¢ must some pair of antipodal points to the same point: f(x) = f(—) has a solution. 
For example, if d = 2, this implies that on earth, there are at every moment two antipodal 
points on the Earth’s surface for which the temperature and the pressure are the same. The 
Borsuk-Ulam theorem appears first have been stated in work of Lazar Lyusternik and Lev 
Shnirelman in 1930, and proven by Karol Borsuk in 1933 who attributed it to Stanislav Ulam. 


90. WEAK MIXING 


An automorphism T of a probability space (X,A,m) is a measure preserving invertible 
measurable transformation from X to X. It is called ergodic if T(A) = A implies m(A) = 0 
or m(A) = 1. It is called mixing if m(T”"(A) MB) > m(A)-m(B) for n > oo for all A, B. 
It is called weakly mixing if n~! la m(T*(A) 9B) — m(A)-m(B)| > 0 for all A,B € A 
and n — oo. This is equivalent to the fact that the unitary operator Uf = f(T’) on L?(X) has 
no point spectrum when restricted to the orthogonal complement of the constant functions. 
A topological transformation (a continuous map on a locally compact topological space) with 
a weakly mixing invariant measure is not integrable as for integrability, one wants every 
invariant measure to lead to an operator U with pure point spectrum and conjugating it so to 
a group translation. Let G be the complete topological group of automorphisms of (X,A,m) 
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with the weak topology: T; converges to T weakly, if m(Z;(A)AT(A)) — 0 for all A € A; this 
topology is metrizable and completeness is defined with respect to an equivalent metric. 


Theorem: A generic T is weakly mixing and so ergodic. 


Anatol Katok and Anatolii Mikhailovich Stepin in 1967 [870] proved that purely singular con- 
tinuous spectrum of U is generic. A new proof was given by [126] and a short proof in using 
Rokhlin’s lemma, Halmos conjugacy lemma and a Simon’s “wonderland theorem" estab- 
lishes both genericity of weak mixing and genericity of singular spectrum. On the topological 
side, a generic volume preserving homeomorphism of a manifold has purely singular continuous 
spectrum which strengthens Oxtoby-Ulam’s theorem [526] about generic ergodicity. 
The Wonderland theorem of Simon [605] also allowed to prove that a generic invariant measure 
of a shift is singular continuous [393] or that zero-dimensional singular continuous spectrum is 
generic for open sets of flows on the torus allowing also to show that open sets of Hamiltonian 
systems contain generic subset with both quasi-periodic as well as weakly mixing invariant tori 


[394]. 
91. UNIVERSALITY 


The space X of unimodular maps is the set of twice continuously differentiable even maps 
f : [-1,1] - [-1,1] satisfying f(0) = 1 f’(x) < 0 and \ = f(1) < 0. The Feigenbaum- 
Cvitanovié functional equation (FCE) is g = Tg with T(g)(z) = $9(g(Az)). The map T is 
a renormalization map. 


Theorem: There exists an analytic hyperbolic fixed point of T. 


The first proof was given by Oscar Lanford HI in 1982 (computer assisted). See [344]. 
That proof also established that the fixed point is hyperbolic with a one-dimensional unstable 
manifold and positive expanding eigenvalue. This explains some universal features of uni- 
modular maps found experimentally in 1978 by Mitchell Feigenbaum and which is now called 
Feigenbaum universality. The result has been ported to area preserving maps [190]. 


92. COMPACTNESS 


Let X be a compact metric space (X,d). The Banach space CX) of real-valued continuous 
functions is equipped with the supremum norm. A closed subset fF C C(X) is called uniformly 
bounded if for every x the supremum of all values f(x) with f € F is bounded. The set F is 
called equicontinuous if for every x and every € > 0 there exists 6 > 0 such that if d(x, y) < 6, 
then | f(x) — f(y)| < ¢€ for all f € F. A set F is called precompact if its closure is compact. 
The Arzela-Ascoli theorem is: 


Theorem: Equicontinuous uniformly bounded sets in C(X) are precompact. 


The result also holds on Hausdorff spaces and not only metric spaces. In the complex, there 
is a variant called Montel’s theorem which is the fundamental normality test for holomorphic 
functions: an uniformly bounded family of holomorphic functions on a complex domain G' is 
normal meaning that its closure is compact with respect to the compact-open topology. The 
compact-open topology in C'(.X,Y) is the topology defined by the sub-base of all continuous 
maps fxy: f : K — U, where K runs over all compact subsets of X and U runs over all open 
subsets of Y. 
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93. GEODESIC 


The geodesic distance d(x, y) between two points x, y ona Riemannian manifold (/, g) is 
defined as the length of the shortest geodesic y connecting x with y. This renders the manifold 
a metric space (IV, d). We assume it is locally compact, meaning that every point x € M has 
a compact neighborhood. A metric space is called complete if every Cauchy sequence in W/ 
has a convergent subsequence. (A sequence 2, is called a Cauchy sequence if for every € > 0, 
there exists n such that for all 7,7 > n one has d(a;,7;) < €.) The local existence of differential 
equations assures that the geodesic equations exist for small enough time. This can be restated 
that the exponential map v € T;,M — M assigning to a point v # 0 in the tangent space 
T,M the solution y(t) with initial velocity v/|v| and t < |v|, and y(0) = 2. A Riemannian 
manifold M is called geodesically complete if the exponential map can be extended to the 
entire tangent space T,,M for every x € M. This means that geodesics can be continued for all 
times. The Hopf-Rinow theorem assures: 


Theorem: Completeness and geodesic completeness are equivalent. 


The theorem was named after Heinz Hopf and his student Willi Rinow who published it in 
1931. See [831] [178]. 


94. CRYSTALLOGRAPHY 


A wall paper group is a discrete subgroup of the Euclidean symmetry group EF» of the 
plane. Wall paper groups classify two-dimensional patterns according to their symmetry. In 
the plane R?, the underlying group is the group E, of Euclidean plane symmetries which 
contain translations rotations or reflections or glide reflections. This group is the group 
of rigid motions. It is a three dimensional Lie group which according to Klein’s Erlangen 
program characterizes Euclidean geometry. Every element in £2 can be given as a pair 
(A, b), where A is an orthogonal matrix and b is a vector. A subgroup G of F4 is called discrete 
if there is a positive minimal distance between two elements of the group. This implies the 
crystallographic restriction theorem assuring that only rotations of order 2,3,4 or 6 can 
appear. This means only rotations by 180,120,90 or 60 degrees can occur in a Wall paper 


group. 
Theorem: There are 17 wallpaper groups 


The first proof was given by Evgraf Fedorov in 1891 and then by George Polya in 1924. in 
three dimensions there are 230 space groups and 219 types if chiral copies are identified. In 
space there are 65 space groups which preserve the orientation. See [525} 303}. 


95. QUADRATIC FORMS 


A symmetric square matrix Q of size n x n with integer entries defines a integer quadratic 
form Q(x) = jel Qi;vi:x;. It is called positive if Q(x) > 0 whenever x # 0. A positive 
integral quadratic form is called universal if its range is N. For example, by the Lagrange 
four square theorem, the form Q(x1,2%2,73,%4) = v7 + 73 + 23 + x3 is universal. The 
Conway-Schneeberger fifteen theorem tells 


Theorem: Q is universal if it has {1,...15} in the range. 
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The interest in quadratic forms started in the 17’th century especially about numbers which 
can be represented as sums x? + y?. Lagrange, in 1770 proved the four square theorem. In 
1916, Ramajujan listed all diagonal quaternary forms which are universal. The 15 theorem 
was proven in 1993 by John Conway and William Schneeberger (a student of Conway’s in a 
graduate course given in 1993). There is an analogue theorem for integral positive quadratic 
forms, these are defined by positive definite matrices Q which take only integer values. The 
binary quadratic form x? + xy + y? for example is integral but not an integer quadratic form 
because the corresponding matrix Q has fractions 1/2. In 2005, Manjul Bhargava and Jonathan 
Hanke proved the 290 theorem, assuring that an integral positive quadratic form is universal if 
it contains {1,...,290} in its range. [140]. 


96. SPHERE PACKING 


A sphere packing in R¢ is an arrangement of non-overlapping unit spheres in the d-dimensional 
Euclidean space R? with volume measure p. It is known since that packings with maximal 
densities exist. Denote by B,(x) the ball of radius r centered at x € R*. If X is the set of 
centers of the sphere and P = U,,.y Bi(x) is the union of the unit balls centered at points in 
X, then the density of the packing is defined as Ag = limsup J5.(0) P du/ J3,(0) 1 du. The 
sphere packing problem is now solved in 5 different cases: 


Theorem: Optimal sphere packings are known for d = 1, 2,3, 8, 24. 


The one-dimensional case A; = 1 is trivial. The case Ay = 77/ /12 was known since Axel Thue 
in 1910 but proven only by Lasl6 Fejes Tooth in 1943. The case d = 3 was called the Kepler 
conjecture as Johannes Kepler conjectured A3 = 7/18. It was settled by Thomas Hales in 
1998 using computer assistance. A complete formal proof appeared in 2015. The case d = 8 was 
settled by Maryna Viazovska who proved in 2017 [671] that Ag = 7*/384 and also established 
uniqueness. The densest packing in the case d = 8 is the Ex lattice. The proof is based on 
linear programming bounds developed by Henry Cohn and Noam Elkies in 2003. Later with 
other collaborators, she also covered the case d = 24. The densest packing in dimension 24 is 


the Leech lattice. For sphere packing see [146]. 


97. STURM THEOREM 


Given a square free real-valued polynomial p let p; denote the Sturm chain, po = p, pi = p’, 
p2 = po mod py, p3 = pi mod py etc. Let o(x) be the number of sign changes ignoring zeros 
in the sequence po(x), p1(Z),..., Pm(z). 


Theorem: The number of distinct roots of p in (a, b] is o(b) — o(a). 


Sturm proved the theorem in 1829. He found his theorem on sequences while studying solutions 
of differential equations Sturm-Liouville theory and credits Fourier for inspiration. See ; 


98. SMITH NORMAL FORM 


A integer m x n matrix A is said to be expressible in Smith normal form if there exists an 
invertible m x m matrix S and an invertible n x n matrix T' so that SMT is a diagonal matrix 
Diag(a1,...,@,,0,0,0) with a;/a;.1. The integers a; are called elementary divisors. They 
can be written as a; = d;(A)/d;_1(A), where dj)(A) = 1 and d;(A) is the greatest common 
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divisor of all k x k minors of A. The Smith normal form is called unique if the elementary 
divisors a; are determined up to a sign. 


Theorem: Any integer matrix has a unique Smith normal form. 
The result was proven by Henry John Stephen Smith in 1861. The result holds more generally 
in a principal ideal domain, which is an integral domain (a ring R in which ab = 0 implies 


a = 0 or b = 0) in which every ideal (an additive subgroup J of the ring such that ab € I if 
a € I and b€ R) is generated by a single element. 


99. SPECTRAL PERTURBATION 


A complex valued matrix A is self-adjoint = Hermitian if A* = A, where Aj, = Aji. The 
spectral theorem assures that A has real eigenvalues Given two selfadjoint complex n x n 
matrices A,B with eigenvalues ay < ag < --- < a, and fy < By <--: < By, one has the 
Lidskii-Last theorem: 


Mheorem:) 3 leg, |= >) a2. 


The result has been deduced by Yoram Last (around 1993) from Lidskii’s inequality found 
in 1950 by Victor Lidskii >7, |a; —8;| < 7, |7j| where 7; are the eigenvalues of C = B—A (see 
page 14). The original Lidskii inequality also holds for p > 1: 37, |aj — 6j/P < 30, lvl’: 
Last’s spin on it allows to estimate the /' spectral distance of two self-adjoint matrices using 
the /* distance of the matrices. This is handy as we often know the matrices A,B explicitly 
rather than the eigenvalues y; of A — B. 


100. RADON TRANSFORM 


In order to solve the tomography problem like magnetic resonance imaging (MRI) of 
finding the density function g(z, y, z) of a three dimensional body, one looks at a slice f(x,y) = 
g(x,y,c), where z = c is kept constant and measures the Radon transform R(f)(p,0) = 
Tis stipes f(x,y) ds. This quantity is the absorption rate due to nuclear magnetic 
resonance along the line L of polar angle a in distance p from the center. Reconstructing 
f(x,y) = 9(2,y,c) for different c allows to recover the tissue density g and so to “see inside 


the body". 
Theorem: The Radon transform can be diagonlized and so pseudo inverted. 


We only need that the Fourier series f(r, ¢) = >>, fn(r)e'”® converges uniformly for all r > 0 and 
that f,(r) has a Taylor series. The expansion f(r, ¢) = )onez Dope fnkWnk With Unger, o) = 


r—*e”® is an eigenfunction expansion with eigenvalues A, = 2 i * cos(nx) cos(x)*-) dx = 
are * See ee: The inverse problem is subtle due to the existence of a kernel 


spanned by {tnx | (2 +k) odd ,|n| > k}. One calls it an ill posed problem in the sense of 
Hadamard. The Radon transform was first studied by Johann Radon in 1917 [309]. 


101. LINEAR PROGRAMMING 


Given two vectors c € R™” and b € R”, and a n xX m matrix A, a linear program is the 
variational problem on R™ to maximize f(x) = c- x subject to the linear constraints Ax < b 
and x > 0. The dual problem is to minimize b-y subject to to A?y > c,y > 0. The maximum 
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principle for linear programming is tells that the solution is on the boundary of the convex 
polytop formed by the feasable region defined by the constraints. 


Theorem: Local optima of linear programs are global and on the boundary 


Since the solutions are located on the vertices of the polytope defined by the constraints the 
simplex algorithm for solving linear programs works: start at a vertex of the polytop, then 
move along the edges along the gradient until the optimum is reached. If A = [2,3] and 
x = [x1,%2| and b = 6 and c = [3,5] we have n = 1,m = 2. The problem is to maximize 
f (a1, %2) = 321 + 5x2 on the triangular region 27; + 3x2 < 6,x%, > 0,22 > 0. Start at (0,0), 
the best improvement is to go to (0,2) which is already the maximum. Linear programming is 
used to solve practical problems in operations research. The simplex algorithm was formulated 
by George Dantzig in 1947. It solves random problems nicely but there are expensive cases in 
general and it is possible that cycles occur. One of the open problems of Steven Smale asks 
for a strongly polynomial time algorithm deciding whether a solution of a linear programming 
problem exists. [505 


102. RANDOM MATRICES 


A random matrix A is given by an n x n array of independent, identically distributed random 
variables A;; of zero mean and standard deviation 1. The eigenvalues A; of A/,/n define a 
discrete measure Ly, = >> j dx, called spectral measure of A. The circular law on the 
complex plane C is the probability measure 9 = 1p/7, where D = {|z| < 1} is the unit disk. 
A sequence v,, of probability measures converges weakly or in law to v if for every continuous 


and bounded function f : C + C one has f[ f(z) dv,(z) > f f(z) dv(z). The circular law is: 
Theorem: Almost surely, the spectral measures converge [ln — [o- 


One can think of A, as a sequence of larger and larger matrix valued random variables. The 
circular law tells that the eigenvalues fill out the unit disk in the complex plane uniformly when 
taking larger and larger matrices. It is a kind of central limit theorem. An older version due 
to Eugene Wigner from 1955 is the semi-circular law telling that in the self-adjoint case, the 
now real measures /J,, converge to a distribution with density V4 — x?/(27) on [—2,2]. The 
circular law was stated first by Jean Ginibre in 1965 and Vyacheslav Girko 1984. It was proven 
first by Z.D. Bai in 1997. Various authors have generalized it and removed more and more 
moment conditions. The latest condition was removed by Terence Tao and Van Vu in 2010, 
proving so the above “fundamental theorem of random matrix theory". See [652]. 


103. DIFFEOMORPHISMS 


Let M be a compact Riemannian surface and T : M — M a C?-diffeomorphism. A Borel 
probability measure 4 on M is T-invariant if w(T(A)) = w(A) for all A € A. It is called 
ergodic if T(A) = A implies (A) = 1 or (A) = 0. The Hausdorff dimension dim(j1) of 
a measure jp is defined as the Hausdorff dimension of the smallest Borel set A of full measure 
u(A) = 1. The entropy h,(7’) is the Kolmogorov-Sinai entropy of the measure-preserving 
dynamical system (X, 7, 1). For an ergodic surface diffeomorphism, the Lyapunov exponents 
M1, Az of (X,T, 2) are the logarithms of the eigenvalues of A = limy...[(dT"(x))*dT" (x) |/2”, 
which is a limiting Oseledec matrix and constant js almost everywhere due to ergodicity. Let 
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A(T, 4) denote the Harmonic mean of \;, —A2. The entropy-dimension-Lyapunov theorem 
tells that for every T-invariant ergodic probability measure p of 7’, one has: 


Theorem: h,, = dim(y)\/2. 


This formula has become famous because it relates “entropy", “fractals" and “chaos", which are 
all “rock star" notions also outside of mathematics. The theorem implies in the case of Lebesgue 
measure preserving symplectic transformation, where dim(w) = 2 and A; = —Ag9 that “entropy 
= Lyaponov exponent" which is a formula of Pesin given by h,,(T) = A(T, ). A similar result 
holds for circle diffeomorphims or smooth interval maps, where h,,(T) = dim(j1) A(T, 4). The 
notion of Hausdorff dimension was introduced by Felix Hausdoff in 1918. Entropy was defined 
in 1958 by Nicolai Kolmogorov and in general by Yakov Sinai in 1959, Lyapunov exponents 
were introduced with the work of Valery Oseledec in 1965. The above theorem is due to Lai- 
Sang Young who proved it in 1982. Francois Ledrapier and Lai-Sang Young proved in 1985 
that in arbitrary dimensions, h, = )>,A;7;, where y; are dimensions of in the direction 
of the Oseledec spaces Ej. This is called the Ledrappier-Young formula. It implies the 
Margulis-Ruelle inequality h,(T) < >>, A; (T), where Aj = max(Aj;,0) and A;(T) are the 
Lyapunov exponents. In the case of a smooth J-invariant measure fz or more generally, for 
SRB measures, there is an equality h,(T) = >, \; (1) which is called the Pesin formula. See 


368} [191]. 


104. LINEARIZATION 


If F : M > M isa globally Lipschitz continuous function on a finite dimensional vector space 
M, then the differential equation 2’ = F(x) has a global solution x(t) = f'(x(0)) (a local 
by Picard-Lindel6f ’s existence theorem and global by the Grénwall inequality). An 
equilibrium point of the system is a point x9 for which F(x.) = 0. This means that xo is a 
fixed point of a differentiable mapping f = f', the time-1-map. We say that f is linearizable 
near 2 if there exists a homeomorphism @¢ from a neighborhood U of a9 to a neighborhood V of 
rg such that ¢o fod! = df. The Sternberg-Grobman-Hartman linearization theorem 
is 


Theorem: If f is hyperbolic, then f is linearizable near xo. 


The theorem was proven by D.M. Grobman in 1959 Philip Hartman in 1960 and by Shlomo 
Sternberg in 1958. This implies the existence of stable and unstable manifolds passing 
through x. One can show more and this is due to Sternberg who wrote a series of papers 
starting 1957 [627]: if A = df(xo) satisfies no resonance condition meaning that no relation 
Ag = A1---A; exists between eigenvalues of A, then a linearization to order n is a C” map 
d(x) = x+ g(x), with g(0) = g'(0) = 0 such that do fod (x) = Ar + o(|z|") near xp. We 
say then that f can be n-linearized near x. The generalized result tells that non-resonance 
fixed points of C” maps are n-linearizable near a fixed point. See [435]. 


105. FRACTALS 


An iterated function system is a finite set of contractions { f;}"_, on a complete metric space 
(X,d). The corresponding Huntchingson operator H(A) = 5°, f;(A) is then a contraction 
on the Hausdorff metric of sets and has a unique fixed point called the attractor S of the 
iterated function system. The definition of Hausdorff dimension is as follows: define h§(A) = 
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infuey >>; |Uil®, where U is a d-cover of A. And h*(A) = lims.9 H3(A). The Hausdorff 
dimension dimy(S) finally is the value s, where h*(.S') jumps from oo to 0. If the contractions 
are maps with contraction factors 0 < A; < 1 then the Hausdorff dimension of the attractor S 
can be estimated with the the similarity dimension of the contraction vector (\j,..., An): 
this number is defined as the solution s of the equation )>"., Ay;* = 1. 


Theorem: iM paisdoxt (S ) = CN sieitacia (O ) : 


There is an equality if f; are all affine contractions like f(x) = A;Av + 6; with the same 
contraction factor and A; are orthogonal and §; are vectors (a situation which generates a 
large class of popular fractals). For equality one also has to assume that there is an open 
non-empty set G such that G; = f;(G) are disjoint. In the case A; = A are all the same 
then nA~%™ = 1 which implies dim(S) = —log(n)/log(A). For the Smith-Cantor set S, 
where fi(z) = £/3 + 2/3, fo(z) = x/3 and G = (0,1). One gets with n = 2 and A = 1/3 
the dimension dim(S) = log(2)/log(3). For the Menger carpet with n = 8 affine maps 
fis(a,y) = (2/3 + 1/3, y/3 + 7/3) with 0 <i < 2,0 <j < 2,(i,7) # (1,1), the dimension is 
log(8)/log(3). The Menger sponge is the analogue object with n = 20 affine contractions 
in R® and has dimension log(20)/log(3). For the Koch curve on the interval, where n = 4 
affine contractions of contraction factor 1/3 exist, the dimension is log(4)/log(3). These are all 
fractals, sets with Hausdorff dimension different from an integer. The modern formulation of 
iterated function systems is due to John E. Hutchingson from 1981. Michael Barnsley used the 
concept for a fractal compression algorithms, which uses the idea that storing the rules 
for an iterated function system is much cheaper than the actual attractor. Iterated function 
systems appear in complex dynamics in the case when the Julia set is completely disconnected, 


they have appeared earlier also in work of Georges de Rham 1957. See [214]. 


106. STRONG LAW OF SMALL NUMBERS 


Like the Bayes theorem or the Pigeon hole principle which both are too simple to qualify as 
“theorems" but still are of utmost importance, the “Strong law of small numbers" is not really 
a theorem but a fundamental mathematical principle. It is more fundamental than a 
specific theorem as it applies throughout mathematics. It is for example important in Ramsey 
theory: The statement is put in different ways like "There aren’t enough small numbers to 
meet the many demands made of them". [282] puts it in the following catchy way: 


Theorem: You can’t tell by looking. 


The point was made by Richard Guy in who states two “corollaries": “superficial similar- 
ities spawn spurious statements" and “early exceptions eclipse eventual essentials". 
The statement is backed up with countless many examples (a list of 35 are given in [282}). 
Famous are Fermat’s claim that all Fermat primes 2?" +1 are prime or the claim that the 
number 73(n) of primes of the form 4k + 3 in {1,...,n} is larger than 7(n) of primes of the 
form 4k + 1 so that the 4k + 3 primes win the prime race. Hardy and Littlewood showed 
however 73(n) — 71(n) changes sign infinitely often. The prime number theorem extended to 
arithmetic progressions shows 7(n) ~ n/(2log(n)) and 73(n) ~ n/(2log(n)) but the density 
of numbers with 73(n) > 7(n) is larger than 1/2. This is the Chebyshev bias. Experiments 
then suggested the density to be 1 but also this is false: the density of numbers for which 
m3(n) > m1(n) is smaller than 1. The principle is important in a branch of combinatorics called 
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Ramsey theory. But it not only applies in discrete mathematics. There are many examples, 
where one can not tell by looking. When looking at the boundary of the Mandelbrot set for 
example, one would tell that it is a fractal with Hausdorff dimension between 1 and 2. In reality 
the Hausdorff dimension is 2 by a result of Mitsuhiro Shishikura. Mandelbrot himself thought 
first “by looking" that the Mandelbrot set M is disconnected. Douady and Hubbard proved 
to be connected. 


107. RAMSEY 'THEORY 


Let G be the complete graph with n vertices. An edge labeling with r colors is an assignment 
of r numbers to the edges of G. A complete sub-graph of G is called a clique. If it is has 
8 vertices, it is denoted by K,. A graph G is called monochromatic if all edges in G have 
the same color. (We use in here coloring as a short for edge labeling and not in the sense 
of chromatology where an edge coloring assumes that intersecting edges have different colors.) 
Ramsey’s theorem is: 


Theorem: For large n, every r-colored K,, contains a monochromatic K. 


So, there exist Ramsey numbers R(r,s) such that for n > R(r,s), the edge coloring of one 
of the s-cliques can occur. A famous case is the identity R(3,3) = 6. Take n = 6 people. It 
defines the complete graph G. If two of them are friends, color the edge blue, otherwise red. 
This friendship graph therefore is a r = 2 coloring of G. There are 78 possible colorings. In 
each of them, there is a triangle of friends or a triangle of strangers. In a group of 6 people, 
there are either a clique with 3 friends or a clique of 3 complete strangers. The theorem was 
proven by Frank Ramsey in 1930. Paul Erdoes asked to give explicit estimated R(s) which is 
the least integer n such that any graph on n vertices contains either a clique of size s (a set 
where all are connected to each other) or an independent set of size s (a set where none are 
connected to each other). Graham for example asks whether the limit R(n)'/” exists. Ramsey 
theory also deals other sets: van der Waerden’s theorem from 1927 for example tells that 
if the positive integers N are colored with r colors, then for every k, there exists an N called 
W(r,k) such that the finite set {1...,N} has an arithmetic progression with the same color. 
For example, W(2,3) = 9. Also here, it is an open problem to find a formula for W(r,k) or 
even give good upper bounds. 


108. POINCARE DUALITY 


For a differentiable Riemannian n-manifold (V/, gq) there is an exterior derivative d = d, 
which maps p-forms A? to (p+ 1)-forms A?*?. For p = 0, the derivative is called the gradient, 
for p = 1, the derivative is called the curl and for p = d — 1, the derivative is the adjoint of 
divergence. The Riemannian metric defines an inner product (f,h) on A” allowing so to see 
AP as part of a Hilbert space and to define the adjoint d* of d. It is a linear map from A?*+ 
to A’. The exterior derivative defines so the self-adjoint Dirac operator D = d+ d* and the 
Hodge Laplacian L = D? = dd* + d*d which now leaves each A? invariant. Hodge theory 
assures that dim(ker(Z|A?)) = 6, = dim(H?(M)), where H?(M) are the p’th cohomology 
group, the kernel of d, modulo the image of d,_;. Poincaré duality is: 


Theorem: If M is orientable n-manifold, then b,(M) = bp_-.(M). 


AT 
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The Hodge dual of f € A? is defined as the unique *g € A” ” satisfying (f,*g) = (f A g,w) 
where w is the volume form. One has d* f = (—1)4*#t! «dx f and L* f = *Lf. This implies 
that * is a unitary map from ker(L|A”) to ker(L|A¢~?) proving so the duality theorem. For 
n = 4k, one has *” = 1, allowing to define the Hirzebruch signature o := dim{u|Lu = 
0, xu = u} — dim(u|Lu = 0, xu = —u}. The Poinaré duality theorem was first stated by Henri 
Poincaré in 1895. It took until the 1930ies to clean out the notions and make it precise. The 
Hodge approach establishing an explicit isomorphism between harmonic p and n — p forms 
appears for example in [156}. 


109. ROKHLIN-KAKUTANI APPROXIMATION 


Let T be an automorphism of a probability space (Q,.A, 4). This means u(A) = u(T(A)) for 
all A € A. The system T is called aperiodic, if the set of periodic points P = {x € 
Q | dn > 0,T"r = x} has measure u(P) = 0. A set B € A which has the property that 
B,T(B),...,T"'(B) are disjoint is called a Rokhlin tower. If the measure of the tower is 
w(BU---UT"1(B)) = nu(B) = 1 -€, we call it an (1 — €)-Rokhlin tower. We say T can be 
approximated arbitrary well by Rokhlin towers, if for all « > 0, there is an (1 — «) Rokhlin 
tower. 


Theorem: An aperiodic T can be approximated well by Rokhlin towers. 


The result was proven by Vladimir Abramovich Rokhlin in his thesis 1947 and independently 
by Shizuo Kakutani in 1943. The lemma can be used to build Kakutani skyscrapers, which 
are nice partitions associated to a transformation. This lemma allows to approximate an 
aperiodic transformation T by a periodic transformations T,,. Just change T on T”~'(B) so 
that T"(a) = x for all . The theorem has been generalized by Donald Ornstein and Benjamin 
Weiss to higher dimensions like Z% actions of measure preserving transformations where the 
periodicity assumption is replaced by the assumption that the action is free: for any n ¥ 0, 


the set T”(x) = x has zero measure. See 231} [292}. 


110. LAX APPROXIMATION 


On the group # of all measurable, invertible transformations on the d-dimensional torus 
X = T? which preserve the Lebesgue volume measure, one has the metric 


6(7,S) = |d(T(a), S(x)) oo 5 


where 6 is the geodesic distance on the flat torus and where | - |. is the L°° supremum norm. 
Lets call (T?,T7,) a toral dynamical system if J is a homeomorphism, a continuous 
transformation with continuous inverse. A cube exchange transformation on T° is a peri- 
odic, piecewise affine measure-preserving transformation T’ which permutes rigidly all the cubes 
TI [Fi /n, (ki + 1)/n], where k; € {0,...,n —1}. Every point in T? is T periodic. A cube ex- 
change transformation is determined by a permutation of the set {1,...,n}¢. If it is cyclic, 
the exchange transformation is called cyclic. A theorem of Lax [440] states that every toral 
dynamical system can approximated in the metric 6 by cube exchange transformations. The 
approximations can even be cyclic [16]. 


Theorem: Toral systems can be approximated by cyclic cube exchanges 
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The result is due to Peter Lax [440]. The proof of this result uses Hall’s marriage theorem in 
graph theory (for a ’book proof’ of the later theorem, see ). Periodic approximations of 
symplectic maps work surprisingly well for relatively small n (see [555]). On the Pesin region 
this can be explained in part by the shadowing property [868]. The approximation by cyclic 
transformations make long time stability questions look different [291]. 


111. SOBOLEV EMBEDDING 


All functions are defined on R”, integrated { over R” and assumed to be locally integrable 
meaning that for every compact set K the Lebesgue integral [ x|f| de is finite. For functions 
in CS which serve as test functions, partial derivatives 0; = 0/0,, and more general 
differential operators D* = On - Om can be applied. A function g is a weak partial 
derivative of f if [ f0;¢dx = — f{ g¢dzx for all test functions ¢. For p € [1, 00), the L? space is 
{f | [ |f|Pdx < co}. The Sobolev space W*? is the set of functions for which all k’th weak 
derivatives are in L?. So W°? = L?. The Hélder space C"® with r € N,a € (0, 1] is defined 
as the set of functions for which all r’th derivatives are a-Hélder continuous. It is a Banach 
space with norm maxj,)<,||D*f ||. + maxjx)=,||D* f||a, where ||f||.. is the supremum norm 
and || f||q is the Hélder coefficient sup,.,, |f(x) — f(y)|/|c¢—y|*. The Sobolev embedding 
theorem is 


Theorem: If n < pandl=r+a<k—n/p, one has W*? c C™*. 


({607| states this as Theorem 6.3.6) gives some history: generalized functions appeared 
first in the work of Oliver Heaviside in the form of “operational calculus. Paul Dirac used 
the formalism in quantum mechanics. In the 1930s, Kurt Otto Friedrichs, Salomon Bocher 
and Sergei Sobolev define weak solutions of PDE’s. Schwartz used the C?° functions, smooth 
functions of compact support. This means that the existence of k weak derivatives implies the 
existence of actual derivatives. For p = 2, the spaces W* are Hilbert spaces and the theory a 
bit simpler due to the availability of Fourier theory, where tempered distributions flourished. 
In that case, one can define for any real s > 0 the Hilbert space H* as the subset of all f € S’ 
for which (1+ |€|?)*/2f(€) is in L?. The Schwartz test functions S consists of all C® functions 
having bounded semi norms ||| = MaXjq}+)4|<k ||? D°$||o0 < 00 where a, 8 € N”. Since S is 
larger than the set of smooth functions of compact support, the dual space S’ is smaller. They 
are tempered distributions. Sobolev emedding theorems like above allow to show that weak 
solutions of PDE’s are smooth: for example, if the Poisson problem Af = Vf with smooth V 
is solved by a distribution f, then f is smooth. [91] [607] 


112. WHITNEY EMBEDDING 


A smooth n-manifold M is a metric space equipped with a cover U; = ¢;'(B) with B={zreE 
R” | |x|? < 1}) or U; = 6;'(H) with H = {x € R” | |a|? < 1,29 > 0}) with 6H = {x € 
H | x = 0} such that the homeomorphisms ¢; : U; — B or ¢; : U; — H lead to smooth 
transition maps ¢,; = ¢;¢,' from ¢,(U; A U,) to ¢;(U; AU,) which have the property that all 
restrictions of ¢,; from 6¢,(U; N U;,) to 6¢;(U; A U;,) are smooth too. The boundary 6M of 
M now naturally is a smooth (n — 1) manifold, the atlas being given by the sets V; = ¢;(6H) 
for the indices 7 which map ¢; : U; — H. Two manifolds M,N are diffeomorphic if there 
is a refinement {U;,@;} of the atlas in M and a refinement {V;,~,} of the atlas in N such 
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that ¢;(U;) = w,(V;). A manifold M can be smoothly embedded in R* if there is a smooth 
injective map f from M to R* such that the image f(M) is diffeomorphic to M. 


Theorem: Any n-manifold M can be smoothly embedded in R?”. 


The theorem has been proven by Hassler Whitney in 1926 who also was the first to give a 
precise definition of manifold in 1936. The standard assumption is that M is second countable 
Hausdorff but as every smooth finite dimensional manifold can be upgraded to be Riemannian, 
the simpler metric assumption is no restriction of generality. The modern point of view is to 
see IW as a scheme over Euclidean n-space, more precisely as a ringed space, that is locally 
the spectrum of the commutative ring C°(B) or C°(H). The set of manifolds is a category 
in which the smooth maps M — N are the morphisms. The cover U; defines an atlas and 
the transition maps @, allow to port notions like smoothness from Euclidean space to M. The 
maps 5" :B—M or 5° : H — M parametrize the sets U;. [692|. 


113. ARTIFICIAL INTELLIGENCE 


Like meta mathematics or reverse mathematics, the field of artificial intelligence (AI) 
is a part of mathematics which also reflects on subject itself. It is related of data science 
(algorithms for data mining, and statistics) computation theory (like complexity theory) 
language theory and especially grammar and evolutionary dynamics, optimization 
problems (like solving optimal transport or extremal problems) solving inverse problems 
(like developing algorithms for computer vision or optical character or speech recognition), 
cognitive science as well as pedagogy in education (human or machine learning and human 
motivation). There is no apparent “fundamental theorem" of AI, (except maybe for Marvin 
Minsky’s "The most efficient way to solve a problem is to already know how to solve it." [495), 
which is a surprisingly deep and insightful statement as modern AI agents like Alexa, Siri, 
Google Home, IBM Watson or Cortana demonstrate; they compute little, they just know 
or look up - or annoy you to look it up yourself...).. But there is a theorem of Lebowski 
on machine super intelligence which taps into the rather uncharted territory of machine 
motivation 


Theorem: No AI will bother after hacking its own reward function. 


The picture is that once the AI has figured out the philosophy of the “Dude" in the Cohen 
brothers movie Lebowski, also repeated mischiefs does not bother it and it “goes bowling". 
Objections are brushed away with “Well, this is your, like, opinion, man". Two examples 
of human super intelligent units who have succeeded to hack their own reward function are 
Alexander Grothendieck or Grigori Perelman. The Lebowski theorem is due to Joscha Bach 
[34], who stated this theorem of super intelligence in a tongue-in-cheek tweet. From a 
mathematical point of view, the smartest way to “solve" an optimal transport problem is to 
change the utility function. On a more serious level, the smartest way to “solve" the continuum 
hypothesis is to change the axiom system. This might look like a cheat, but on a meta level, 
more creativity is possible. Precursor’s of the Lebowski theme is Stanislav Lem’s notion of a 
mimicretin [442], a computer that plays stupid in order, once and for all, to be left in peace 
or the machine in [6] who develops humor and enjoys fooling humans with the answer to the 
ultimate question: “42". This document entry is the analogue to the ultimate question: “What 
is the fundamental theorem of AI"? 
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114. STOKES THEOREM 


On a smooth orientable n-dimensional manifold M, one has A?, the vector bundle of smooth 
differential p-forms. As any p-form F induces an induced volume form on a p-dimensional 
sub-manifold G defining so an integral ie F. The exterior derivative d: A? —> A?*" satis- 
fies d? = 0 and defines an elliptic complex. There is a natural Hodge duality isomorphism 
given called “Hodge star" *« : A? + A"”?. Given a p-form F' € A? and a (p+ 1)-dimensional 
compact oriented sub-manifold G of M with boundary 0G compatible with the orientation of 
G, we have Stokes theorem:‘ 


Theorem: (G,dF)= [dF = f,.F = (6G, F). 


The theorem states that the exterior derivative d is dual to the boundary operator 6. If G 
is a connected 1-manifold with boundary, it is a curve with boundary 6G = {A,B}. A 1- 
form can be integrated over the curve G by choosing the on G induced volume form r'(t)dt 
given by a curve parametrization [a,b] — G and integrate ft F(r(t)) - r'(t)dt, which is 
the line integral. Stokes theorem is then the fundamental theorem of line integrals. 
Take a 0-form f which is a scalar function the derivative df is the gradient F = Vf. Then 
ie Vif(r(t))-7'(t) dt = f(B)— f(A). If G is a two dimensional surface with boundary dG and F 
is a 1-form, then the 2-form dF is the curl of F’. If G is given as a surface parametrization 
r(u,v), one can apply dF on the pair of tangent vectors r,,7r, and integrate this dF'(r,,r,) over 
the surface G to get ‘- dF’. The Kelvin-Stokes theorem tells that this is the same than the 
line integral Ts F. In the case of M = R®, where F = Pdx + Qdy + Rdz can be identified with 
a vector field F' = i a R ve af =V x oe ore of a 2-form H over a parametrized 
manifold Gis f J, H(r(u,v)) (rust) = J Jp A (r(u, v)-ru x rydudv we get the classical Kelvin- 
Stokes theorem. if F is a 2-form, then dF is a 3-form which can be integrated over a 3- 
manifold G. As d: A? — A? can via Hodge duality naturally be paired with dj : Al > A°, 
which is the divergence, the divergence theorem [ { [,div(F) dxdydz = f J,,F - dS 
relates a triple integral with a flux integral. Historical milestones start with the development 
of the fundamental theorem of calculus (1666 Isaac Newton, 1668 James Gregory, Isaac 
Barrow 1670 and Gottfried Leibniz 1693); the first rigorous proof was done by Cauchy in 1823 
(the first textbook appearance in 1876 by Paul du Bois-Reymond). See [87]. In 1762, Joseph- 
Louis Lagrange and in 1813 Karl-Friedrich Gauss look at special cases of divergence theorem, 
Mikhail Ostogradsky in 1826 and George Green in 1828 cover the general case. Green’s theorem 
in two dimensions was first stated by Augustin-Louis Cauchy in 1846 and Bernhard Riemann 
in 1851. Stokes theorem first appeared in 1854 as an exam question but the theorem has 
appeared already in a letter of William Thomson to Lord Kelvin in 1850, hence also the name 
Kelvin-Stokes theorem. Vito Volterra in 1889 and Henri Poincaré in 1899 generalized the 
theorems to higher dimensions. Differential forms were introduced in 1899 by Elie Cartan. The 
d notation for exterior derivative was introduced in 1902 by Theodore de Donder. The ultimate 
formulation above is from Cartan 1945. We followed Katz who noticed that only in 1959, 
this version has started to appear in textbooks. 


115. MOMENTS 


The Hausdorff moment problem asks for necessary and sufficient conditions for a sequence 
[in to be realizable as a moment sequence fo x” du(«) for a Borel probability measure on (0, 1]. 
One can study the problem also in higher dimensions: for a multi-index n = (n,...,nq) denote 
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by fin = f ay ...27" du(a) the n’th moment of a signed Borel measure ,: on the unit cube 
I¢ = [0,1]¢ C R*. We say pu, is a moment configuration if there exists a measure 4 which 
has fl, as moments. If e; denotes the standard basis in Z%, define the partial difference 


(Aia)n = Gn—c; — dn and A* = J, A¥*. We write k — [[7_, & and ( : ) = eee ( ie ) 


i=l ny 


and \7p_9 = dom=o0'** Dono: We say moments ji, are Hausdorff bounded if there exists a 


constant C’ such that )7;_9 | ( ) (A )n| < C for all n € N*. The theorem of Hausdorff- 
Hildebrandt-Schoenberg is 


Theorem: Hausdorff bounded moments fz, are generated by a measure U. 


The above result is due to Theophil Henry Hildebrandt and Isaac Jacob Schoenberg from 1933. 
[316]. Moments also allow to compare measures: a measure 1 is called uniformly absolutely 
continuous with respect to v if there exists f € L°(v) such that 4 = fv. A positive probability 
measure p is uniformly absolutely continuous with respect to a second probability measure v 
if and only if there exists a constant C such that (A*u), < C- (A*v), for all k,n € N?@. 
In particular it gives a generalization of a result of Felix Hausdorff from 1921 assuring 
that jz is positive if and only if (A*), > 0 for all k,n € N¢. An other special case is that 
is uniformly absolutely continuous with respect to Lebesgue measure v on J? if and only if 

k n 
dnl < (2 
looking at moment generating functions )¢,, ut” of random variables X, where pu, = E[X”] 
as well as in multivariate statistics, when looking at random vectors (Xj,...,Xq), where 
Hn = E[X}"--- X74] are multivariate moments. See [896] [589] 


(n +1)? for all k and n. Moments play an important role in statistics, when 


116. MARTINGALES 


A sequence of random variables X,, X2,... on a probability space (©, A, P) is called a discrete 
time stochastic process. We assume the X; to be in L? meaning that the expectation 
E[XZ] < oo for all k. Given a sub-o algebra B of A, the conditional expectation E[X |B] is 
the projection of L?(Q, A, P) to L?(Q, B, P). Extreme cases are E[X|.A] = X and E[X|{@, Q}] = 
E|X]. A finite set Y1,...,Y, of random variables generates a sub- o-algebra B of A, the 
smallest o-algebra for which all Y; are still measurable. Write E[|X|Yi,---,Y,] = E[X|8], 
where B is the o-algebra generated by Y,,---Y,. A discrete time stochastic process is called 
a martingale if E[X,41|Xi,--- , Xn] = E[X,] for all n. If the equal sign is replaced with < 
then the process is called a super-martingale, if > it is a sub-martingale. The random 
walk X, = )-;_,Y defined by a sequence of independent L? random variables Y;, is an 
example of a martingale because independence implies E[X;,41|X1,--+ , Xn] = E|Xn41] which 
is E[X,,] by the identical distribution assumption. If X and M are two discrete time stochastic 
processes, define the martingale transform (=discrete Ito integral) X - M as the process 
(X-M), = op, Xn(Mz — Mg-i). If the process X is bounded meaning that there exists a 
constant C' such that E[|X;,|] < C for all k, then if M is a martingale, also X- M is a martingale. 
The Doob martingale convergence theorem is 


Theorem: For a bounded super martingale X, then X, converges in L?. 
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The convergence theorem can be used to prove the optimal stopping time theorem which 
tells that the expected value of a stopping time is the initial expected value. In finance it 
is known as the fundamental theorem of asset pricing. If 7 is a stopping time adapted 
to a martingale X;, it defines the random variable X, and E[LX,] = E[X]. For a super- 
martingale one has > and for a sub-martingale <. The proof is obtained by defining the 
stopped process X7 = Xo + ais — X;,) which is a martingale transform and so 
a martingale. The martingale convergence theorem gives a limiting random variable X, and 
because E[X7|] = E[Xo] for all n, ELX;] = E[Xo]. This is rephrased as “you can not beat the 
system" [697]. A trivial implication is that one can not for example design a strategy allowing 
to win in a fair game by designing a “clever stopping time" like betting on “red" in roulette if 
6 times “black" in a row has occurred. Or to follow the strategy to stop the game, if one has a 
first positive total win, which one can always do by doubling the bet in case of losing a game. 
Martingales were introduced by Paul Lévy in 1934, the name “martingale" (referring to the just 
mentioned doubling betting strategy) was added in a 1939 probability book of Jean Ville. The 
theory was developed by Joseph Leo Doob in his book of 1953. [182]. See [697]. 


117. THEOREMA EGREGIUM 


A Riemannian metric on a two-dimensional manifold S defines the quadratic form I = Edu? + 
2Fdudv + Gdv? called first fundamental form on the surface. If r(u,v) is a parameterization 
of S, then FE = 1, -1y,F = ry,-Ty and G =r,-r,. The second fundamental form of S 
is II = Ldu? + 2Mdudv + Ndv’, where L = ry,-7,M = ty-n,N = ry -n, written using 
the normal vector n = (ry, X Ty)/|Tu X Ty|. The Gaussian curvature K = det(IJ)/det(I) = 
(LN — M?)/(EG— F?). depends on the embedding r : R > S in space R°, but it actually only 
depends on the intrinsic metric, the first fundamental form. This is the Theorema egregium 
of Gauss: 


Theorem: The Gaussian curvature only depends on the Riemannian metric. 


Gauss himself already gave explicit formulas, but a formula of Brioschi gives the curvature 
explicitly as a ratio of determinants involving EF, FG as well as and first and second derivatives 
of them. In the case when the surface is given as a graph z = f(z,y), one can give K = 
D/(1+|Vf|?)?, where D = (fre fyy — f2,) is the discriminant and (1+ |Vf|?)? = det(IJ). If 
the surface is rotated in space so that (u,v) is a critical point for f, then the discriminant 
D is equal to the curvature. One can see the independence of the embedding also from the 
Puiseux formula K = 3(|So(r)| — S(r))/(mr?), where |So(r)| = 27r is the circumference of 
the circle So(r) in the flat case and |.S(r)| is the circumference of the geodesic circle of radius 
ron S. The theorem Egregium also follows from Gauss-Bonnet as the later allows to write the 
curvature in terms of the angle sum of a geodesic infinitesimal triangle with the angle sum 7 
of a flat triangle. As the angle sums are entirely defined intrinsically, the curvature is intrinsic. 
The “Theorema Egregium" was found by Karl-Friedrich Gauss in 1827 and published in 1828 
in “Disquisitiones generales circa superficies curvas". It is not an accident, that Gauss was 
occupied with concrete geodesic triangulation problems too. 


118. ENTROPY 


Given a random variable X on a probability space (Q,.A, P) which is finite and discrete in the 
sense that it takes only finitely many values, the entropy is defined as S(X) = — )°,, px log(pz), 
where p, = P|X = a]. To compare, for a random variable X with cumulative distribution 
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function F(x) = P[X < x] having a continuous derivative F” = f, the entropy is defined as 
S(X) = —f f(x) log(f(x)) dz, allowing the value —oo if the integral does not converge. (We 
always read plog(p) = 0 if p = 0.) In the continuous case, one also calls this the differential 
entropy. Two discrete random variables X,Y are called independent if one can realize them 
on a product probability space Q = A x B so that X(a,b) = X(a) and Y(a,b) = Y(b) for 
some functions X : A > R,Y : B — R. Independence implies that the random variables are 
uncorrelated, E[XY] = E[LX]E[Y] and that the entropy adds up S(XY) = S(X)+4+ S(Y). 
We can write S(X) = Ellog(W(x))], where W is the “Wahrscheinlichkeit" random variable 
assigning to w € 2 the value W(w) = 1/p, if X(w) = x. Let us say, a functional on discrete 
random variables is additive if it is of the form H(X) = >>, f(p.) for some continuous function 
f for which f(t)/t is monotone. We say it is multiplicative if H(XY) = H(X)+ A(Y) for 
independent random variables. The functional is normalized if H(X) = log(4) if X is a 
random variable taking two values {0,1} with probability py = p; = 1/2. Shannon’s theorem 
is: 


Theorem: Any normalized, additive and multiplicative H is entropy S. 


The word “entropy" was introduced by Rudolf Clausius in 1850 [571]. Ludwig Bolzmann saw 
the importance of £9 > 0 in the context of heat and wrote in 1872 S = kglog(W), where 
W (a) = 1/p, is the inverse “Wahrscheinlichkeit" that a state has the value z. His equation 
is understood as the expectation S = kgE[log(W)] = >°., pz log(W(x)) which is the Shannon 
entropy, introduced in 1948 by Claude Shannon in the context of information theory. (Shannon 
characterized functionals H with the property that if H is continuous in p, then for random 
variables H,, with p,(H,) = 1/n, one has H(X,)/n < H(X,,)/m ifn < m and if X,Y are 
two random variables so that the finite o-algebras A defined by X is a sub-o-algebra B defined 
by Y, then H(Y) = A(X) +), peH(Yz), where Y,(w) = Y(w) for w € {X = x}. One can 
show that these Shannon conditions are equivalent to the combination of being additive and 
multiplicative. In statistical thermodynamics, where p, is the probability of a micro-state, 
then kpS is also called the Gibbs entropy, where kg is the Boltzmann constant. For 
general random variables X on (Q,.A, P) and a finite o-sub-algebra 6B, Gibbs looked in 1902 at 
course grained entropy, which is the entropy of the conditional expectation Y = E|X |B}, 
which is now a random variable Y taking only finitely many values so that entropy is defined. 


See [598]. 


119. MOUNTAIN PASS 


Let H be a Hilbert space, and let f be a twice Fréchet differentiable function from H to 
R. The Fréchet derivative A = f'’ at a point x € H is a linear operator A satisfying 
f(a +h) — f(x) — Ah = o(h) for all h > 0. A point x € A is called a critical point of f 
if f’(~) =0. The functional satisfies the Palais-Smale condition, if every sequence x; in H 
for which {f(x,)} is bounded and f’(z;,) — 0, has a convergent subsequence in the closure of 
{tz}ren. A pair of points a,b € H defines a mountain pass, if there exist « > 0 and r > 0 
such that f(z) > f(a) +¢€ on S,(a) = {x € A | ||x —a|| =r}, f is not constant on S,(a) and 
f(b) < f(a). A critical point is called a saddle if it is neither a maximum nor a minimum of 


Theorem: If a Palais-Smale f has a mountain pass, it features a saddle. 
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The idea is to look at all continuous paths y from a to b parametrized by t € [0,1]. For each 
path y, the value c, = f(7(t)) has to be maximal for some time t¢ € [0,1]. The infimum over 
all these critical values c, is a critical value of f. The mountain pass condition leads to a 
“mountain ridge" and the critical point is a “mountain pass", hence the name. The example 
(2 exp(—2x? — y”) —1)(x? + y”) with a = (0,0), 6 = (1,0) shows that the non-constant condition 
is necessary for a saddle point on S,(a) with r = 1/2. The reason for sticking with a Hilbert 
space is that it is easier to realize the compactness condition due to weak star compactness of 
the unit ball. But it is possible to weaken the conditions and work with a Banach manifolds X 
continuous Gateaux derivatives: f’: X — X* if X has the strong and X* the weak-x topology. 
It is difficult to pinpoint historically the first use of the mountain pass principle as it must have 
been known intuitively since antiquity. The crucial Palais-Smale compactness condition 
which makes the theorem work in infinite dimensions appeared in 1964. calls it condition 
(C), a notion which already appeared in the original paper [530]. 


120. EXPONENTIAL SUMS 


Given a smooth function f : R — R which maps integers to integers, one can look at expo- 
nential sums )~’_, exp(izf(a)) An example is the Gaussian sum )~"_) exp(iax?). There 
are lots of interesting relations and estimates. One of the magical formulas is the Landsberg- 
Schaar relations for the finite sums S(q, p) = 7: y-?- 5 exp(inx?q/p). 


Theorem: If p,q are positive and odd integers, then $(2q, p) = e’"/+S(—p, 2q). 


One has S(1,p) = (1/,/p) i exp(iz?/p) = 1 for all positive integers p and S(2,p) = 


(ei"/4/,/p) er exp(2ix?/p) = 1 if p = 4k + 1 andi if p = 4k — 1. The method of expo- 
nential sums has been expanded especially by Vinogradov’s papers and used for number 
theory like for quadratic reciprocity [506]. The topic is of interest also outside of number the- 
ory. Like in dynamical systems theory as Fiirstenberg has demonstrated. An ergodic theorist 
would look at the dynamical system T(x, y) = (x +2y+1,y+1) on the 2-torus T? = R*/(7Z)? 
and define g,(z,y) = exp(izza). Since the orbit of this toral map is T"(1,1) = (n?,n), the 
exponential sum can be written as a Birkhoff sum ee Jq/p(L'*(1, 1)) which is a particular 
orbit of a dynamical system. Results as those mentioned above show that the random walk 
grows like ,/p, similarly as in a random setting. Now, since the dynamical system is minimal, 
the growth rate should not depend on the initial point and 7q/p should be replaceable by any 
irrational a and no more be linked to the length of the orbit. The problem is then to study 
the growth rate of the stochastic process S*(x,y) = 37,5 g(T'"(2, y)) (= sequence of random 
variables) for any continuous g with zero expectation which by Fourier boils down to look at 
exponential sums. Of course S"(x, y)/t — 0 by Birkhoff’s ergodic theorem, but as in the law 
of iterated logarithm one is interested in precise growth rates. This can be subtle. Already in 
the simpler case of an integrable T(x) = x + a on the 1-torus, there is Denjoy-Koskma theory 
which shows that the growth rate depends on Diophantine properties of ma. Unlike for irra- 
tional rotations, the Fiirstenberg type skew systems T’ leading to the theta functions are not 
integrable: it is not conjugated to a group translation (there is some randomness, even-so weak 
as Kolmogorov-Sinai entropy is zero). The dichotomy between structure and randomness and 
especially the similarities between dynamical and number theoretical set-ups has been discussed 
in [651]. 


59 


FUNDAMENTAL THEOREMS 


121. SPHERE THEOREM 


A compact Riemannian manifold M is said to have positive curvature, if all sectional 
curvatures are positive. The sectional curvature at a point x € M in the direction of the 2- 
dimensional plane © C T;,M is defined as the Gaussian curvature of the surface exp,() C M at 
the point. In terms of the Riemannian curvature tensor R : T,,M* > R and an orthonormal 
basis {u,v} spanning %, this is R(u,v,u,v). The curvature is called quarter pinched, if it 
the sectional curvature is in the interval (1,4] at all points x € M. In particular, a quarter 
pinched manifold is a manifold with positive curvature. We say here, a compact Riemannian 
manifold is a sphere if it is homeomorphic to a sphere. The sphere theorem is: 


Theorem: A simply-connected quarter pinched manifold is a sphere 


The theorem was proven by Marcel Berger and Wilhelm Klingenberg in 1960. That a pinching 
condition would imply a manifold to be a sphere had been conjectured already by Heinz Hopf. 
Hopf himself proved in 1926 that constant sectional curvature implies that M is even isometric 
to a sphere. Harry Rauch, after visiting Hopf in Ztirich in the 1940’s proved that a 3/4- 
pinched simply connected manifold is a sphere. In 2007, Simon Brendle and Richard Schoen 
proved that the theorem even holds if the statement / is a d-sphere (meaning that M is 
diffeomorphic to the Euclidean d-sphere {|x|? = 1} C R@**). This is the differentiable sphere 
theorem. Since John Milnor had given in 1956 examples of spheres which are homeomorphic 
but not diffeomorphic to the standard sphere (so called exotic spheres, spheres which carry 
a smooth maximal atlas different from the standard one), the differentiable sphere theorem 
is a substantial improvement on the topological sphere theorem. It needed completely new 
techniques, especially the Ricci flow g = —2Ric(g) of Richard Hamilton which is a weakly 
parabolic partial differential equation deforming the metric g and uses the Ricci curvature 


Ric of g. See [53] [85]. 


122. WORD PROBLEM 


The word problem in a finitely presented group G = (g|r) with generators g and re- 
lations r is the problem to decide, whether a given set of two words v,w represent the same 
group element in G or not. The word problem is not solvable in general. There are concrete 
finitely presented groups in which it is not. The following theorem of Boone and Higman relates 
the solvability to algebra. A group is simple if its only normal subgroup is either the trivial 
group or then the group itself. 


Theorem: Finitely presented simple groups have a solvable word problem. 


More generally, if G C H C K where H is simple and K is finitely presented, then G has 
a solvable word problem. Max Dehn proposed the word problem in 1911. Pyotr Novikov in 
1955 proved that the word problem is undecidable for finitely presented groups. William W. 
Boone and Graham Higman proved the theorem in 1974 [72]. Higman would in the same year 
also find an example of an infinite finitely presented simple group. The non-solvability of the 
word problem implies the non-solvability of the homeomorphism problem for n-manifolds with 


n >A. See [707]. 
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123. FINITE SIMPLE GROUPS 


A finite group (G, x, 1) is a finite set G with an operation « : Gx G — Gand | element, such 
that the operation is associative (a *b) *c = a*(b*c), for all a,b,c, such that atl1=1l*a=a 
for every a and such that every a has an inverse a“! satisfying a* a! = 1. A group G is 
simple if the only normal subgroups of G are the trivial group {1} or the group itself. A 
subgroup H of G is called normal if gH = Hg for all g. Simple groups play the role of the 
primes in the set of integers. A theorem of Jordan-H6lder is that the composition series 
of G (with simple groups as quotients) is unique up to permutations and isomorphisms. The 
classification theorem of finite simple groups is 


Theorem: Every finite simple group is cyclic, alternating, Lie or sporadic. 


There are 18 so called regular families of finite simple groups made of cyclic, alternating 
and 16 Lie type groups. Then there are 26 so called sporadic groups, in which 20 are happy 
groups as they are subgroups or sub-quotients of the monster and 6 are pariahs, outcasts 
which are not under the spell of the monster. The classification was a huge collaborative effort 
with more than 100 authors, covering 500 journal articles. According to Daniel Gorenstein, the 
classification was completed in 1981 and fixes were applied until 2004. (Michael Aschbacher 
and Stephen Smith resolved the last problems which lasted several years leading to a full proof 
of 1300 pages.) A second generation cleaned-out proof written with more details is under way 
and currently has 5000 pages. Some history is given in [615]. 


124. GOD NUMBER 


Given a finite finitely presented group G = (g|r) like for example the Rubik group. It defines 
the Cayley graph [ in which the group elements are the nodes and where two nodes a,b 
are connected if there is a generator x in in g such that xa = b. The diameter of a graph 
is the largest geodesic distance between two nodes in TI. It is also called the God number 
of the puzzle. The Rubik cube is an example of a finitely presented group. The original 
3x33 cube allows to permute the 26 boundary cubes using the 18 possible rotations of the 6 
faces as generators. From the X = 8!12!3°2!? possible ways to physically build the cube, only 
|G| = X/12 = 43252003274489856000 are present in the Rubik group G. Some of the positions 
“quarks" can not be realized but combinations of them “mesons" or “baryons" can. 


Theorem: The God number of the Rubik cube is 20. 


This means that from any position, one could, in principle solve the puzzle in 20 moves. Note 
that one has to specify clearly the generators of the group as this defines the Cayley graph 
and so a metric on the group. The lower bound 18 had already been known in 1980 because a 
counting of all the possible moves with 17 steps produced less elements. The lower bound 20 
came in 1995 when Michael Reid proved that the super-flip position (where the edges are 
all flipped but corners are correct) needs 20 moves. In July 2010, using about 35 CPU years, 
a team around Tomas Rokicki established that the God number is 20. They partitioned the 
possible group positions into roughly 2 billion sets of 20 billions positions each. Using symmetry 
they reduced it to 55 million positions, then found solutions for any of the positions in these 
sets. It appears silly to put a God number computation as a fundamental theorem, but 
the status of the Rubik cube is enormous as it has been one of the most popular puzzles for 
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decades and is a prototype for many other similar puzzles, the choice can be defended. 
One can ask to compute the God number of any finitely presented finite group. Interesting 
in general is the complexity of evaluating that functional. The simplest nontrivial Rubik 
cuboid is the 2 x 2 x 1 one. It has 6 positions and 2 generators a,b. The finitely presented 
group is {a, bja2 = b? = (ab)? = 1} which is the dihedral group D3. Its group elements are 
G = {1,a = babab, ab = baba, aba = bab, abab = ba, ababa = b}. The group is isomorphic to the 
symmetry group of the equilateral triangle, generated by the two reflections a,b at two 
altitude lines. The God number of that group is 3 because the Cayley graph IT is the cyclic 
graph Cg. The puzzle solver has here “no other choice than solving the puzzle", because one is 
forced to make non-trivial move in each step. See or for general combinatorial group 
theory and for a recent auto biography of Erno Rubik. 


125. SARD THEOREM 


Let f : M — N bea smooth map between smooth manifolds M, N of dimension dim(M) = m 
and dim(N) =n. A point x € M is called a critical point of f, if the Jacobian n x m matrix 
df(z) has rank both smaller than m and n. If C is the set of critical points, then f(C) C N 
is called the critical set of f. The volume measure on JN is a choice of a volume form, 
obtained for example after introducing a Riemannian metric. Sard’s theorem is 


Theorem: The critical set of f : 1 — N has zero volume measure in N. 


The theorem applied to smooth map f : M — R tells that for almost all c, the set f~(c) 
is a smooth hypersurface of M or then empty. The later can happen if f is constant. We 
assumed C'* but one can relax the smoothness assumption of f. If n > m, then f needs only 
to be continuously differentiable. If n < m, then f needs to be in C™"*!. The case when 
N is one-dimensional has been covered by Antony Morse (who is unrelated to Marston Morse) 
in 1939 and by Arthur Sard in general in 1942. A bit confusing is that Marston Morse (not 
Antony) covered the case m = 1,2,3 and Sard in the case m = 4,5,6 in unpublished papers 
before as mentioned in a footnote to [585]. Sard also notes already that examples of Hassler 
Whitney show that the smoothness condition can not be relaxed. Sard formulated the results 
for M = R™ and N = R” (by the way with the same choice f : M — N as done here and not 
as in many other places). The manifold case appears for example in [628]. 


126. ELLIPTIC CURVES 


An elliptic curve is a plane algebraic curve defined by the points satisfying the Weierstrass 
equation y? = x? + ar+0 = f(x). One assumes the curve to be non-singular, meaning 
that the discriminant A = —16(4a? + 27b*) is not zero. This assures that there are no 
cusps nor multiple roots for the simple reason that the explicit solution formulas for roots of 
f(x) = 0 involves only square roots of A. A curve is an Abelian variety, if it carries an 
Abelian algebraic group structure, meaning that the addition of a point defines a morphism of 
the variety. 


Theorem: Elliptic curves are Abelian varieties. 


Ty presented the God number problem in the 80ies as an undergraduate in a logic seminar of Ernst Specker 
and the choice of topic had been objected to by Specker himself as a too “narrow problem". But the Rubik 
cube and its group properties have “cult status". The object was one of the triggers for me to study math. 
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The theorem seems first have been realized by Henri Poincaré in 1901. Weierstrass before had 
used the Weierstrass P function earlier in the case of elliptic curves over the complex plane. To 
define the group multiplication, one uses the chord-tangent construction: first add point O 
called the point at infinity which serves as the zero in the group. Then define —P as the point 
obtained by reflecting at the z-axes. The group multiplication between two different points 
P,Q on the curve is defined to be —R if R is the point of intersection of the line through P,Q 
with the curve. If P = Q, then R is defined to be the intersection of the tangent with the curve. 
If there is no intersection, that is if P = Q is an inflection point, then one defines P+ P = —P. 
Finally, define P+ O =O+P= P and P+(-—P) = 0. This recipe can be explicitly given 
in coordinates allowing to define the multiplication in any field of characteristic different from 
2 or 3. The group structure on elliptic curves over finite fields provides a rich source of finite 
Abelian groups which can be used for cryptological purposes, the so called elliptic curve 
cryptograph ECC. Any procedure, like public key, Diffie-Hellman or factorization attacks on 
integers can be done using groups given by elliptic curves. [676]. 


127. BILLIARDS 


Billiards are the geodesic flow on a smooth compact n-manifold M with boundary. The dy- 
namics is extended through the boundary by applying the law of reflection. While the flow 
of the geodesic X‘ is Hamiltonian on the unit tangent bundle SM, the billiard flow is only 
piecewise smooth and also the return map to the boundary is not continuous in general but it 
is a map preserving a natural volume so that one can look at ergodic theory. Already difficult 
are flat 2-manifolds M homeomorphic to a disc having convex boundary homeomorphic to a 
circle. For smooth convex tables this leads to a return map T on the annulus X = T x [—1, 1] 
which is C’~! smooth if the boundary is C” [183]. It defines a monotone twist map: in the 
sense that it preserves the boundary, is area and orientation preserving and satisfies the twist 
condition that y > T(z, y) is strictly monotone. A Bunimovich stadium is the 2-manifold 
with boundary obtained by taking the convex hull of two discs of equal radius in R with dif- 
ferent center. The billiard map is called chaotic, if it is ergodic and the Kolmogorov-Sinai 
entropy is positive. By Pesin theory, this metric entropy is the Lyapunov exponent which 
is the exponential growth rate of the Jacobian dT” (and constant almost everywhere due to 
ergodicity). There are coordinates in the tangent bundle of the annulus X in which dT is the 
composition of a horizontal shear with strength L(x, y), where L is the trajectory length before 
the impact with a vertical shear with strength —2/sin(@) where K(x) is the curvature of the 
curve at the impact x and y = cos(@), with impact angle 6 € [0,7] between the tangent and 
the trajectory. 


Theorem: The Bunimovich stadium billiard is chaotic. 


Jacques Hadmard in 1898 and Emile Artin in 1924 already looked at the geodesic flow on 
a surface of constant negative curvature. Yakov Sinai constructed in 1970 the first chaotic 
billiards, the Lorentz gas or Sinai billiard. An example, where Sinai’s result applies is the 
hypocycloid «/3 + y'/3 = 1. The Bernoulli property was established by Giovanni Gallavotti 
and Donald Ornstein in 1974. In 1973, Vladimir Lazutkin proved that a generic smooth convex 
two-dimensional billiard can not be ergodic due to the presence of KAM whisper galleries 
using Moser’s twist map theorem. These galleries are absent in the presence of flat points 
(by a theorem of John Mather) or points, where the curvature is unbounded (by a theorem of 
Andrea Hubacher ). Leonid Bunimovich constructed in 1979 the first convex chaotic 
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billiard. No smooth convex billiard table with positive Kolmogorov-Sinai entropy is known. A 
candidate is the real analytic «4+ y* = 1. Various generalizations have been considered like in 
(700). A detailed proof that the Bunimovich stadium is measure theoretically conjugated to a 
Bernoulli system (the shift on a product space) is surprisingly difficult: one has to show positive 
Lyapunov exponents on a set of positive measure. Applying Pesin theory with singularities 
(Katok-Strelcyn theory [B69]) gives a Markov process. One needs then to establish ergodicity 
using a method of Eberhard Hopf of 1936 which requires to understand stable and unstable 
manifolds [123]. See for sources on billiards. 


128. UNIFORMIZATION 


A Riemann surface is a one-dimensional complex manifold. This means is is a connected 
two-dimensional real manifold so that the transition functions of the atlas are holomorphic 
mappings of the complex plane. It is simply connected if its fundamental group is trivial 
(equivalently, its genus b; is zero). Two Riemann surfaces are conformally equivalent or 
simply equivalent if they are equivalent as complex manifolds, that is if there is a bijective 
morphism f between them. A map f : S — S’ is holmorphic if for every choice of coordinates 
@:S > Cand y’: S’ 5 C, the maps ¢’ 0 f o dé! are holomorphic. The curvature is the 
Gaussian curvature of the surface. The uniformization theorem is: 


Theorem: A Riemann surface is equivalent to one with constant curvature. 


This is a “geometrization statement" and means that the universal cover of every Riemann sur- 
face is conformally equivalent to either a Riemann sphere (positive curvature), a complex 
plane (zero curvature) or a unit disk (negative curvature). It implies that any region G C C 
whose complement contains two or more points has a universal cover which is the disk. It 
especially implies the Riemann mapping theorem assuring that any region U homeomor- 
phic to a disk is conformally equivalent to the unit disk (see [110]). For a detailed treatment 
of compact Riemann surfaces, see [249]. It also follows that all Riemann surfaces (without 
restriction of genus) can be obtained as quotients of these three spaces: for the sphere one 
does not have to take any quotient, the genus 1 surfaces = elliptic curves can be obtained 
as quotients of the complex plane and any genus g > 1 surface can be obtained as quotients 
of the unit disk. Since every closed 2-dimensional orientable surface is characterized by their 
genus g, the uniformization theorem implies that any such surface admits a metric of constant 
curvature. Teichmiiller theory parametrizes the possible metrics, and there are 3g — 3 dimen- 
sional parameters for g > 2, whereas for g = 0 there is one and for g = 1 a moduli space 
H/.S'L2(Z). In higher dimensions, closest to the uniformization theorem is the Killing-Hopf 
theorem telling that every connected complete Riemannian manifold of constant sectional 
curvature and dimension n is isometric to the quotient of a sphere S", Euclidean space IR” 
or Hyperbolic n-space H!” restating that constant curvature geometry is either elliptic, para- 
bolic=Euclidean or yyperbolic geometry. Complex analysis has rich applications in complex 


dynamics [45} and relates to much more geometry [482]. 


129. CONTROL THEORY 


A Kalman filter is an optional estimates algorithm of a linear dynamic system from a series 
of possibly noisy measurements. The idea is similar as in a dynamic Bayesian network or 
hidden Markov model. The filter applies both to differential equations «(t) = Az(t) + 
Bu(t)+Gz(t) as well as discrete dynamical system x(t+ 1) = Az(t)+ Bu(t)+Gz(t), where 
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u(t) is external input and z(t) input noise given by independent identically distributed 
usually Gaussian random variables. Kalman calls this a Wiener problem. One does not 
see the state x(t) of the system but some output y(t) = Cx(t)+ Du(t). The filter then “filters 
out" or “learns" the best estimate x*(t) from the observed data y(t). The linear space X is 
defined as the vector space spanned by the already observed vectors. The optimal solution is 
given by a sophisticated dynamical data fitting. 


Theorem: The optimal estimate x* is the projection of y onto X. 


This formulation is the informal 1-sentence description which can be found already in Kalman’s 
article. Kalman then gives explicit formulas which generate from the stochastic difference 
equation a concrete deterministic linear system. For a modern exposition, see [462]. The 
Kalman filter is named after Rudolf Kalman who wrote [362] in 1960. Kalman’s paper is 
one of the most cited papers in applied mathematics. The ideas were used both in the Apollo 
and Space Shuttle program. Similar ideas have been introduced in statistics by the Danish 
astronomer Thorvald Thiele and the radar theoretician Peter Swerling. There are also nonlinear 
version of the Kalman filter which is used in nonlinear state estimation like navigation systems 
and GPS. The nonlinear version uses a multi-variate Taylor series expansion to linearise about 


a working point. See [212] 462]. 


130. ZARISKI MAIN THEOREM 


A variety is called normal if it can be covered by open affine varieties whose rings of functions 
are normal. A commutative ring is called normal if it has no non-zero nilpotent elements and 
is integrally closed in its complete ring of fractions. For a curve, a one-dimensional variety, 
normality is equivalent to being non-singular but in higher dimensions, a normal variety still 
can have singularities. The normal complex variety is called unibranch at a point x € X if 
there are arbitrary small neighborhoods U of x such that the set of non-singular points of U is 
connected. Zariski’s main theorem can be stated as: 


Theorem: Any closed point of a normal complex variety is unibranch. 


Oscar Zariski proved the theorem in 1943. To cite [504], “ct was the final result in a foundational 
analysis of birational maps between varieties. The ‘main Theorem’ asserts in a strong sense 
that the normalization (the integral closure) of a variety X is the maximal variety X' birational 
over X, such that the fibres of the map X' —+ X are finite. A generalization of this fact 
became Alexandre Grothendieck’s concept of the ‘Stein factorization’ of a map. The result 
has been generalized to schemes X, which is called unibranch at a point x if the local ring 
at x is unibranch. A generalization is the Zariski connectedness theorem from 1957: if 
f :X - Y isa birational projective morphism between Noetherian integral schemes, then the 
inverse image of every normal point of Y is connected. Put more colloquially, the fibres of a 
birational morphism from a projective variety X to a normal variety Y are connected. It implies 
that a birational morphism f : X — Y of algebraic varieties X,Y is an open embedding into a 
neighbourhood of a normal point y if f~'(y) is a finite set. Especially, a birational morphism 
between normal varieties which is bijective near points is an isomorphism. 
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131. POINCARE’S LAST THEOREM 


A homeomorphism T of an annulus X = T x [0,1] is called measure preserving if it 
preserves the Lebesgue (area) measure and preserves the orientation of X. As a homeomor- 
phism it induces also homeomorphisms on each of the two boundary circles. It is called twist 
homeomorphism, if it rotates the boundaries in different directions. 


Theorem: A twist map on an annulus has at least two fixed points. 


This is called the Poincaré-Birkhoff theorem or Poincaré’s last theorem. It was stated by 
Henri Poincaré in 1912 in the context of the three body problem. Poincaré already gave an 
index argument for the existence of one fixed point gives a second. The existence of the first 
was proven by George Birkhoff in 1913 and in 1925, where Birkhoff added the precise argument 
for the existence of the second. The twist condition is necessary because the rotation of the 
annulus (r,@) + (r,9+1) has no fixed point. Also area-preservation is necessary as the example 


(r,0) > (r(2—1r),@+ 2r — 1) shows. [64] 


132. GEOMETRIZATION 


A closed manifold M is a smooth compact manifold without boundary. A closed manifold 
is simply connected if it is connected and the fundamental group is trivial meaning that 
every closed loop in M can be pulled together to a point within M: (ifr: T—7 Misa 
parametrization of a closed path in M, then there exists a continuous map R: T x [0,1] ~ M 
such that R(0,t) = r(t) and R(1,t) = r(0).) We say that M is 3-sphere if M is homeomorphic 
to the 3-dimensional unit sphere {(x1,72,273,%4) € R* | ej +75 + 23 + 23 = 1}. 


Theorem: A closed simply connected 3-manifold is a 3-sphere. 


Henri Poincaré conjectured this in 1904. It remained the Poincaré conjecture until its proof 
by Grigori Perelman in 2006 [498]. In higher dimensions, the statement was known as the 
generalized Poincaré conjecture, the case n > 4 had been proven by Stephen Smale in 
1961 and the case n = 4 by Michael Freedman in 1982. A d-homotopy sphere is a closed 
d-manifold that is homotopic to a d-sphere. (A manifold M is homotopic to a manifold N if 
there exists a continuous map f : M— N and a continuous map g : N > M such that the 
composition go f : M — M is homotopic to the identity map on M (meaning that there exists 
a continuous map F': M x [0,1] — M such that F(x,0) = g(f(x)) and F(x,1) = a and the 
map fog: N +N is homotopic to the identity on N.) The Poincaré conjecture itself, the case 
d = 3, was proven using a theory built by Richard Hamilton who suggested to use the Ricci 
flow to solve the conjecture and more generally the geometrization conjecture of William 
Thurston: every closed 3-manifold can be decomposed into prime manifolds which are of 
8 types, the so called Thurston geometries S°, E?, H?,S? x R, H? x R, SL(2, R), Nil, Solv. 
If the statement M is a sphere is replaced by M is diffeomorphic to a sphere, one has 
the smooth Poincaré conjecture. Perelman’s proof verifies this also in dimension d = 3. 
The smooth Poincaré conjecture is false in dimension d > 7 as d-spheres then can admit non- 
standard smooth structures, so called exotic spheres constructed first by John Milnor. For 
d = 5 it is true following result of Dennis Barden from 1964. It is also true for d = 6. For 
d = 4, the smooth Poincaré conjecture is open, and called “the last man standing among all 
great problems of classical geometric topology" [454]. See for details on Perelman’s proof. 
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133. STEINITZ THEOREM 


A non-empty finite simple connected graph G is called planar if it can be embedded in the 
plane R? without self crossings. The abstract edges of the graph are then realized as actual 
curves in the plane connecting two vertices which are realized as actual points in the plane. 
The embedding of G in the plane subdivides the plane now into a finite collection F' of simply 
connected regions called faces. (In the two dimensional plane, a region is simply connected 
if it is homeomorphic to a disc.) Let v = |V| is the number of vertices, e = |E| the number 
of edges and f = |F| is the number of faces. A planar graph is called polyhedral if it can 
be realized as a convex polyhedron, a convex hull of finitely many points in R®. A graph is 
called 3-connected, if it remains connected also after removing one or two of its vertices. A 
connected, planar 3-connected graph is also called a 3-polyhedral graph. The Polyhedral 
formula of Euler combined with Steinitz’s theorem means: 


Theorem: G planar > v —e+ f = 2. Planar 3-connected © polyhedral. 


The Euler polyhedron formula has first been noticed in examples by René Descartes [4] and 
written down in a secret notebook. It was realized by Euler in 1750 that the formula works 
for general planar graphs. Euler already gave an induction proof (also in 1752) but the first 
complete proof appears have been given first by Legendre in 1794. The Steinitz theorem was 
proven by Ernst Steinitz in 1922, even so he obtained the result already in 1916. In general, a 
planar graph always defines a finite generalized CW complex in which the faces are the 2-cells, 
the edges are the 1-cells and the vertices are the 0-cells. The embedding in the plane defines 
then a geometric realization of this combinatorial structure as a topological 2-sphere (as the 
2-sphere is the compactification of the plane). The structure is not required to be achievable 
in the form of a convex polyhedron. And it is in general not: take a tree graph for example, 
a connected graph without triangles and without closed loops. It is planar but it is not even 
2-connected. The number of vertices v and the number of edges e satisfy v—e = 1. After 
embedding the tree in the plane, we have exactly one face, so that f = 1. The Euler polyhedron 
formula v—e+ f = 2 is verified, but the graph is far from polyhedral. Even in the extreme case, 
where G is a one-point graph, the Euler formula holds: in that case there are v = 1 vertices, 
e = 0 edges and f = 1 faces (given by the complement of the point in the plane) so that still 
v—e+ f =2 holds The 3-connectedness assures that the realization can be done using convex 
polyhedra. It is then even possible to have force the vertices of the polyhedron to be on the 
integer lattice points [718]. In [278], it is stated that the Steinitz theorem is “the most 
important and deepest known result for 3-polytopes". 


134. HILBERT-EINSTEIN ACTION 


Let (M,g) be a smooth 4-dimensional Lorentzian manifold which is asymptotically flat. 
(A simplification is that the Riemannian curvature tensor R is flat outside a compact subset 
of M but this is a bit restrictive as the Schwarzschild solution below indicates.) A Lorentzian 
manifold is a 4-dimensional pseudo Riemannian manifold of signature (1,3) which in the flat 
case is dx? + dy? + dz? — dt?. The technical condition of asymptotic flatness should imply that 
the volume form dy then has the property that the scalar curvature R is in L'(M, du) (which 
is the case if the non-flat part is compact.) One can now look at the variational problem to 
find extrema of the functional g > [ vy Pdu. More generally, one can add a Lagrangian L one 
consider the Hilbert-Einstein functional /,, R/K + Ld, where « = 87G/c' is the Einstein 
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constant. Let R;; be the Ricci tensor, a symmetric tensor, and Tj; the energy-momentum 
tensor. The Einstein field equations are 


Theorem: Ga — ne = Gal? = Ale. 


These are the Euler-Lagrange equations of an infinite-dimensional extremization problem. The 
variational problem was proposed by David Hilbert in 1915. Einstein published in the same 
year the general theory of relativity. In the case of a vacuum: T' = 0, solutions g of the 
Einstein equations define Einstein manifolds (/,g). An example of a solution to the vacuum 
Einstein equations different from the flat space solution is the Schwarzschild solution, which 
was found also in 1915 and published in 1916. It is the metric given in spherical coordinates 
as —(1 — r/p)c?dt? + (1 — r/p)~'dp? + p?d¢? + p? sin? ¢dé?, where r is the Schwarzschild 
radius, p the distance to the singularity, 0, @ are the standard Euler angles (longitude and 
colatitude) in calculus. The metric solves the Einstein equations for p > r. The flat metric 
— dt? +dp?+p*d0?+ p? sin? 6d¢? describes the vacuum and the Schwarzschild solution describes 
the gravitational field near a massive body. Intuitively, the metric tensor g is determined by 
g(v,v), and the Ricci tensor by R(v,v) which is 3 times the average sectional curvature over 
all planes passing through a plane through v. The scalar curvature is 6 times the average over 
all sectional curvatures passing through a point. See [129]. 


135. HALL STABLE MARRIAGE 


Let X be a finite set and A a family of finite subsets A of X. A transversal of A is an injective 
function f : A — X such that f(A) € A for all A € A. The set A satisfies the marriage 
condition if for every finite subset B of A, one has |B] < |Uy-, A]. The Hall marriage 
theorem is 


Theorem: A has a transversal = A satisfies marriage condition. 


The theorem was proven by Philip Hall in 1935. It implies for example that if a deck of cards 
with 52 cards is partitioned into 13 equal sized piles, one can chose from each deck a card so 
that the 13 cards have exactly one card of each rank. The theorem can be deduced from a 
result in graph geometry: if G = (V, E) = (X,0)+(Y,9) is a bipartite graph, then a matching 
in G is a collection of edges which pairwise have no common vertex. For a subset W of X, let 
S(W) denote the set of all vertices adjacent to some element in W. The theorem assures that 
there is an X-saturating matching (a matching that covers X) if and only if |W| < |S(W)| 
for every W Cc X. The reason for the name “marriage” is the situation that X is a set of 
men and Y a set of women and that all men are eager to marry. Let A; be the set of women 
which could make a spouse for the 7’th man, then marrying everybody off is an X-saturating 
matching. The condition is that any set of & men has a combined list of at least k women who 
would make suitable spouses. See [93]. 


136. MANDELBULB 


The Mandelbrot set M = Mp» is the set of vectors c € R? for which T(x) = 2? + c leads 
to a bounded orbit starting at 0 = (0,0), where x? has the polar coordinates (r?, 26) if x 
has the polar coordinates (r,@). (The map T is just a real reformulation of the complex map 
T(z) = 22+ c in C and written in the real so that the construction can be done in arbitrary 
dimensions.) The Mandelbulb set M3. is defined as the set of vectors c € R® for which 
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T(x) = 2° +c leads to a bounded orbit starting at 0 = (0,0,0), where z® has the spherical 
coordinates (p°, 84,80) if z has the spherical coordinates (p,¢,6). Like the Mandelbrot 
set, it is a compact set (just verify that for |x| > 2, the orbits go to infinity). The topology 
of Mg is unexplored. Also like in the complex plane, one could look at the dynamics of a 
polynomials p = aj + au +---+a,x" in R”. If (p,61,.-..,@n—1) are spherical coordinates, then 
rsx” = (p™,m¢,...,M¢n_1) is a higher dimensional “power" and allows to look at the 
dynamics of T},,(~) = p(x). This defines then a corresponding Mandelbulb ™,,,,. As with all 
celebrities, there is a scandal: 


Theorem: There is no theorem about the Mandelbulb M,,,, for n > 2. 


Except of course the just stated theorem. But you decide whether it is true of not. The 
Mandelbulb set has been discovered only recently. An attempt to trace some of its history was 
done in [391]: already Rudy Rucker had experimented with a variant of Mz» in 1988. Jules 
Ruis wrote me to have written a computer program in “Basic" in 1997. The first person we 
know who wrote down the formulas used today is Daniel White , mentioned in a 2009 fractal 
forum. Jules Ruis 3D printed the first models in 2010. See also for some information on 
generating the graphics. 


137. BANACH ALAOGLU 


A Banach space X is a linear space equipped with a norm | - | defining a metric d(x, y) = 
|x — y| with respect to which the space X is complete. The unit ball in X is the closed 
ball {2 € X | |x| < 1}. The dual space X* of X is the linear space of linear functionals 
f : X — R with the norm |f| = supyj<i2ex |f(x)|. It is again a Banach space. The weak* 
topology is the smallest topology on X* which makes all maps f — f(x) continuous for all 
eX. 


Theorem: The unit ball in a dual Banach space X* is weak* compact. 


The theorem was proven in 1932 in the separable case by Stefan Banach and in 1940 in general 
by Leonidas Alaoglu. The result essentially follows from Tychonov’s theorem as X* can be seen 
as a closed subset of a product space. Banach-Alaoglu therefore relies on the axiom of choice. A 
case which often appears in applications is when X = C'(K’) is the space of continuous functions 
on a compact Hausdorff space kK. In that case X* is the space of signed measures on KK. One 
implication is that the set of probability measures is compact on A. An other example are 
L” spaces (p € [1, 00), for which the dual is LY with 1/p+1/q = 1 (meaning g = co for p = 1) 
and showing that for p = 2, the Hilbert space L? is self-dual. In the work of Bourbaki the 
theorem was extended from Banach spaces to locally convex spaces (linear spaces equipped 
with a family of semi-norms). Examples are Fréchet spaces (locally convex spaces which are 
complete with respect to a translation-invariant metric). See [142]. 


138. WHITNEY TRICK 


Let M be a smooth orientable simply connected d-manifold and two smooth connected sub- 
manifolds kK, L of dimension k and | such that k +] = d which have the property that K and L 
intersect transversely in points x,y in the sense that the tangent spaces at the intersection 
points span TM and T,M and that they have opposite intersection sign. The two manifolds 
K, L can be isotoped from each other along a disc if there exists a smooth 2-disk embedded 
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in M such that MOK and ML are single points. The disk is called a Whitney disk. The 
Whitney trick or Whitney lemma is: 


Theorem: Any transverse K, L of > 3 manifolds in MW has a Whitney disk. 


See [181]. In there are counter examples in d < 4. The author writes there “A hypothesis 
of algebraic topology given by the signs of the intersection points leads to the existence of 
an isotopy". The failure of the Whitney trick in smaller dimensions is one reason why some 
questions in manifold theory appear hardest in three or four dimension. There is a variant 
of the Whitney trick which works also in dimensions 5, where K has dimension 2 and L has 
dimension 3. 


139. TORSION GROUPS 


An elliptic curve E over Q is also called a rational elliptic curve. The curve F carries 
an Abelian group structure where every addition of a point x > x+y is a morphism. The 
torsion subgroup of F is the subgroup consisting of elements which all have finite order in 
E. The Mordell-Weil theorem (which applies more generally for any Abelian variety) assures 
that FE = Z" @T, where T is a finite group and r is a finite number called the rank of E. 
Mazur’s torsion theorem states that the only possible finite orders in FE are 1,2,3,...,9,10 
and 12. Only 15 different torsion subgroups appear in rational elliptic curves: Z1,..., 219, Z12 
or Zy X Zo, ZX Z4, Zo X Ze and Zo X Zs. Lets call this collection of groups the Mazur class. 
The theorem is: 


Theorem: The torsion group of a rational elliptic curve is in the Mazur class. 


The theorem was proven by Barry Mazur in 1977. |602}. 


140. COLORING 


A graph G = (V, F) with vertex set V and edge set EF is called planar if it can be embedded in 
the Euclidean plane R? without any of the edges intersecting. By a theorem of Kuratowski, this 
is equivalent to a graph theoretical statement: G does not contain a homeomorphic image of 
neither the complete graph Ks nor the bipartite utility graph K33. A graph coloring with k 
colors is a function f : V > {1,2,...,k} with the property that if (z,y) € FE, then f(x) £ f(y). 
In other words, adjacent vertices must have different colors. The 4-color theorem is: 


Theorem: Every planar graph can be colored with 4 colors. 


Some graphs need 4 colors like a wheel graph having an odd number of spikes. There are 
planar graphs which need less. The 1l-point graph Ky needs only one color, trees needs only 
2 colors and the graph K3 or any wheel graph with an even number of spikes only need 3 
colors. The theorem has an interesting history: since August Ferdinand Mobius in 1840 spread 
a precursor problem given to him by Benjamin Gotthold Weiske, the problem was first known 
also as the Mébius-Weiske puzzle [614]. The actual problem was first posed in 1852 by 
Francis Guthrie [463], after thinking about it with his brother Frederick, who communicated 
it to his teacher Augustus de Morgan, a former teacher of Francis who told William Hamilton 
about it. Arthur Cayley in 1878 put it first in print, (but it was still not in the language of 
graph theory). Alfred Kempe published a proof in 1879. But a gap was noticed by Percy John 
Heawood 11 years later in 1890. There were other unsuccessful attempts like one by Peter 
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Tait in 1880. After considerable theoretical work by various mathematicians including Charles 
Pierce, George Birkhoff, Oswald Veblen, Philip Franklin, Hassler Whitney, Hugo Hadwiger, 
Leonard Brooks, William Tutte, Yoshio Shimamoto, Heinrich Heesch, Karl Diirre or Walter 
Stromquist, a computer assisted proof of the 4-color theorem was obtained by Ken Appel 
and Wolfgang Haken in 1976. In 1997, Neil Robertson, Daniel Sanders, Paul Seymour, and 
Robin Thomas wrote a new computer program. Goerge Gonthier produced in 2004 a fully 
machine-checked proof of the four-color theorem [698]. There is a considerable literature like 
[698}. 


141. CONTACT GEOMETRY 


Assume M is a smooth compact orientable (2n — 1)-manifold equipped with an auxiliary Rie- 
mannian metric g. A 1-form a € A!(M) defines a field of hyperplanes € = ker(a) C TM. 
Conversely, given a field of hyperplanes, one can define a = g(X,-), where X is a local non-zero 
section of the line bundle €+. A contact structure is a hyperplane field € = da for which 
the volume form a / (da)" is nowhere zero. The 1-form a is then called a contact form and 
(M,&) is called a contact manifold. The Reeb vector field FR is defined by da(R,-) = 0, 
a(R) = 1. The Weinstein conjecture is a theorem in dimension 3: 


Theorem: On a 3-manifold, the Reeb vector field has a closed periodic orbit. 


The theorem was proven by Clifford Taubes in 2007 using Seiberg-Witten theory. Mike Hutch- 
ings with Taubes established 2 Reeb orbits under the condition that all Reeb orbits R are 
non-degenerate in the sense that the linearized flow does not have an eigenvalue 1. Hutch- 
ings with Dan Cristofaro-Gardiner later removed the non-degeneracy condition and 
also showed that if the product of the actions A(y) = 1 a of the two orbits is larger than the 
volume [ vc da of the contact form, then there are three. To the history: Alan Weinstein 
has shown already that if Y is a convex compact hypersurface in R?”, then there is a periodic 
orbit. Paul Rabinovitz extended it to star-shaped surfaces. Weinstein conjectured in 1978 that 
every compact hypersurface of contact type in a symplectic manifold has a closed character- 
istic. Contact geometry as an odd dimensional brother of symplectic geometry has become 
its own field. Contact structures are the opposite of integrable hyperplane fields: the Frobe- 
nius integrability condition a \ da = 0 defines an integrable hyperplane field forming a 
co-dimension 1 foliation of M. Contact geometry is therefore a “totally non-integrable hyper 
plane field". [243]. The higher dimensional case of the Weinstein conjecture is wide open [338]. 
Also the symplectic question whether every compact and regular energy surface H = c for a 
Hamiltonian vector field in R?” has a periodic solution is open. One knows that there are for 
almost all energy values in a small interval around c. [817]. 


142. SIMPLICIAL SPHERES 


A convex polytope G is defined as the convex hull of n points in R% such that all vertices 
are extreme points called vertices. (Extreme points are points which do not lie in an open 
line segment of G.) This definition of [278] is also called a polytopal sphere. A simplicial 
sphere is a geometric realization of a simplicial complex that is homeomorphic to the standard 
(d-1)-dimensional spheres in R?. For a polytopal sphere, the boundary of G is made up of 
(d — 1)-dimensional polytopes called (d—1)-faces. A cyclic polytope C(n, d) can be realized 
as the convex hull of the n vertices {(t,¢?,t?,---t@) | t= 1,2,...,n} C R% Let f,(G) denote 
the number of k-dimensional faces in G. So, fo(G) is the number of vertices, f;(G) the number 
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of line segments and fy; the number of facets, the highest dimensional faces in G. Extending 
the definition to f_; = 1 (counting the empty complex, which is a (—1)-dimensional complex), 
the vector f = (f_-1, fo, fi,:-: fa) is called the extended f-vector of G. The upper bound 
theorem is 


Theorem: For simplicial spheres with fo(G) = n, then fi(G) < fx(C(n, d)). 


This had been the upper bound conjecture of Theodore Motzkin from 1957 which was 
proven by Peter McMullen in 1970 who reformulated it hy(G) < a) for all k < d/2 
as the other numbers are determined by Dehn-Sommerville conditions hy, = hg_, for 0 < 
k <d. The h-vector (ho,...hg) and f-vector (f_1, fo,..., fa-1) determine each other via 
cs fpitin-l* = = h,t?-*. Victor Klee suggested the upper bound conjecture to be 
true for simplicial spheres, which was then proven in by Richard Stanley in 1975 using new 
ideas like relating h, with intersection cohomology of a projective toric variety associated 
with the dual of G. (A toric variety is an algebraic variety containing an algebraic torus as 
an open dense subset such that the group action on the torus extends to the variety.) The 
result for simplicial spheres implies the result for convex polytopes because a subdivision of 
faces of a convex polytope into simplices only increases the numbers f;,. The g-conjecture of 
McMullen from 1971 gives a complete characterization of f-vectors of simplicial spheres. Define 
go = 1 and gx = hy — he-1 for k < d/2. The g-conjecture claims that (go, ... gja/2]) appears as 
a g-vector of a sphere triangulation if and only if there exists a multicomplex [’ with exactly 
gx vectors of degree k for all 0 < i < [d/2]. (A multi-complex [ is a set of non-negative 
integer vectors (@1,...,@,) such that if 0 < b; < a;, then (b;,...b,) is in T. The degree of a 
multicomplex is }),a;.) The g-theorem proves this for polytopal spheres (Billera and Lee in 
1980 sufficiency) and (Stanley 1980 giving necessity). The g-conjecture is open for simplicial 


spheres. [718] [621] [121] 
143. BERTRAND POSTULATE 


A basic result in number theory is 
Theorem: For n > 1, there always exists a prime p between n and 2n. 


As the theorem was conjectured in 1845 by Joseph Bertrand, it is still called Bertrand’s 
postulate. Since Pafnuty Tschebyschef’s (Chebyshev) proof in 1852, it is a theorem. For 
a proof, see [342] page 367. Srinivasa Ramanujan simplified Chebyshev’s proof considerably 
in 1919 and strengthened it: if a(x) = Vides prime £ is the prime counting function, then 
Bertrand’s result can be restated as m(x) — a(a/2) > 1 for > 2. Ramanujna shows that 
n(x) — m(x/2) > k, for large enough x (larger or equal than p;,). The primes p, giving the 
lower bound for x solving this are called Ramanujan primes. Simple proofs like one of Erd6és 
from 1932 are given in Wikipedia or [833] page 82, who notes "it is not a very sharp result. 
Deep analytic methods can be used to give much better results concerning the gaps between 
successive primes". There is a very simple proof assuming the Goldbach conjecture (stating 
that every even number larger than 2 is a sum of two primes): [559] if n is not prime, then 
2n = p+q is a sum of two primes, where one is larger than n and one smaller than 2n; on 
the other hand, if n is prime, then n + 1 is not prime and 2n + 2 = p+ q is a sum of two 
primes, where one, say q is larger than n and smaller than 2n +2. But q can not be 2n + 1 (as 
that would mean p = 1), nor 2n (as 2n is composite) so that n < q < 2n. There are various 
generalizations like Mohamed El Bachraoui’s 2006 theorem that there are primes between 2n 
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and 3n or Denis Hanson from 1973 that there are primes between 3n and 4n for n > 1. 
Mohamed El Bachraoui asked in 2006 whether for all n > 1 and all k < n, there exists a 
prime in [kn, (k + 1)n] which is for k = 1 the Bertrand postulate. A positive answer would give 
that there is always a prime in the interval [n?,n? +n]. Already the Legendre conjecture, 
asking whether there is always a prime p satisfying n? < p < (n+1)? for n > 1 is open. The 
Legendre’s conjecture is the fourth of the super famous great problems of Edmund Landau’s 
1912 list: the other three are the Goldbach conjecture, the twin prime conjecture and 
then the Landau conjecture asking whether there are infinitely many primes of the form 
n? +1. Landau really nailed it. There are 4 conjectures only, but all of them can be stated 
in half a dozen words, are completely elementary, and for more than 100 years, nobody has 
proven nor disproved any of them. 


144. NON-SQUEEZING THEOREM 


The Euclidean space M = R?” carries the standard symplectic 2-form w(v, w) = (v, Jw) with 

0 TL 
—I 0 
called symplectic, if A satisfies A’ JA = J. A smooth transformation f : M — M is called 
a symplectomorphism if it is a diffeomorphism and if the derivative df is a symplectic map 
from T,M — Ty(2)M at every point x € M. Any smooth map for which df is symplectic is 
automatically a diffeomorphism as symplectic matrices have determinant | and are so invertible. 
Let B(r) = {x € M | x-ax < r*} denote the round solid ball of radius r and Z(r) = {x € 
M | x7 +y? < r?} the solid cylinder of radius r. Given two sets A, B, one says there is a 
symplectic embedding of A in B, if there exists a symplectomorphism f such that f(A) Cc B. 
As symplectic maps are volume preserving, a necessary condition is Vol(A) < Vol(B). Is this 
the only constraint? Yes, for n = 1, where the cylinder and the ball are the same as defined 
B(r) = Z(r). But no in higher dimensions n > 2 by the Gromov non-squeezing theorem: 


the skew-symmetric matrix J = . A linear transformation f : M— M,xz —> Az is 


Theorem: A symplectic embedding B(r) > Z(R) implies r < R. 


The theorem has been proven in 1985 by Michael Gromov. It has been dubbed as the prin- 
ciple of the symplectic camel by Maurice de Gosson referring to the “eye of the needle" 


metaphor. A reformulation of the Gosson allegory [158] after encoding “camel" = “ball in the 
phase space", “hole = “cylinder", and “pass"=“symplectically embed into", “size of the hole" 
= “radius of cylinder" and “size of the camel" = “radius of the ball" is: “There is no way that 


a camel can pass through a hole if the size of the hole is smaller than the size of the camel". 
See [477] [339] for expositions. The non-squeezing theorem motivated also the introduction of 
symplectic capacities, quantities which are monotone c(M) < c(N) if there is a symplectic 
embedding of M into N, which are conformal in the sense that if w is scaled by X, then c(/) 
is scaled by |A| and such that c(B(1)) = c(Z(1)) = a. For n = 1, the area is an example of a 
symplectic capacity (actually unique). The existence of a symplectic capacity obviously proves 
the squeezing theorem. Already Gromov introduced an example, the Gromov width, which 
is the smallest. More are constructed in using calculus of variations. See [317] 478}. 


145. KAHLER GEOMETRY 


A Kahler manifold is a complex manifold (M, J) together with a Hermitian metric h whose 
associated Kahler form w is closed. (The manifold can be given by a Riemannian metric g 
compatible with the complex structure g(JX,JY) = g(X,Y). The Kahler form w is then 
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a 2-form w(X,Y) = g(JX,Y) satisfying dw = 0 and the metric h = g + iw is the Hermitian 
metric. (IV,w) is then also a symplectic manifold.) As w is closed, it represents an element in 
the cohomology class H?(M) called Kahler class. The Calabi inverse problem is: given a 
compact Kahler manifold (M,wo) and a (1,1)-form R representing 27 times the first Chern 
class of M, find a metric w in the Kahler class of wo such that Ricci(w) = R. In local 
coordinates, one can write Ricci(w) = —i00 log det(g). For compact M: 


Theorem: The Calabi inverse problem has a unique solution w. 


This was conjectured in 1957 by Eugenio Calabi and proven in 1978 by Shing-Tung Yau by 
solving nonlinear Monge-Ampére equations using analytic Nash-Moser type techniques. The 
theorem implies that if the first Chern class of M is zero, then (M,wo) carries has a unique 
Ricci-flat Kahler metric g in the same Kahler class than wo. Kahler geometry deals simultane- 
ously with Riemannian, symplectic and complex structures: (MV, g) is a Riemannian, (Vw) is 
a symplectic and (M, J) is a complex manifold. The inverse problem of characterizing geome- 
tries from curvature data is central in all of differential geometry. Here are some examples: a) 
M =C? with Euclidean metric g is Kahler with w = (1/2) 5°, dz* A dz* but it is not compact. 
But if P is a lattice, then the induced metric on the torus C"/T is Kahler. b) Because complex 
submanifolds of a Kahler manifold are Kahler , and the complex projective space CP” with the 
Fubini-Study metric is Kahler (with w = id0p, where p = log(1+ 5°, |zx|?/2) is the Kahler 
potential), any complex projective variety is Kahler. d) For the complex hyperbolic case 
where M is the unit ball in C”, the Kahler potential is p = 1 — |z|?. By Kodeira, Kahler 
forms representing an integral cohomology class correspond to projective algebraic varieties. c) 
Calabi-Yau manifolds are complex Kahler manifolds with zero first Chern classes. Examples 
are K3 surfaces. The existence theorem assures that they carry a Ricci-flat metric, which are 
examples of Kahler-Einstein metrics. Also Hodge theory works well for Kahler manifolds. In 
the complex, the Dolbeault operators 0,0 and d = 0+0 lead to Hodge Laplacians Ag, A5 
and Aq, and so to harmonic forms H?*? for differential forms of type (p,q) and harmonic r- 
forms for A. In the Kahler case, H” = ae 4q-r #4. An example result due to Lichnerowicz is 
that if Ricci(Q) > A > 0, then the first eigenvalue A, of A satisfies 47 > 2A. See [3]. 


146. PROJECTIVE GEOMETRY 


A conic section is a curve which is obtained when intersecting a cone 7?+y? = z? with a plane 
ax+by+cz = d. A bit more general is a conic, an algebraic curve ax?+bry+cy?+dr+ey+g = 0 
of degree 2. They are either non-singular conics, classified as ellipses like x? + y? = 1, 
hyperbola x?—y? = 1 or parabola x” = y, or then degenerate conics like a point x?+y? = 0, 
the cross x? = y?, the line x? = 0 or pair of parallel lines x? = 1. Given 6 different 
points Aj, Ao, A3, By, Bo, Bs on a conic, where Aj,, Ag, A3 are neighboring and B,, By, B3 are 
neighboring, a Pascal configuration is the set of lines A;B; with 1 4 7. The intersection 
points of this Pascal configuration is the set of three intersections of A;B; with A;B;, where 
{i, 7} runs over all three 2-point subsets of {1, 2,3}. 


Theorem: The intersection points of a Pascal configuration are on a line. 


The theorem was found in 1639 by Blaise Pascal (as a teenager) in the case of an ellipse. A 
limiting case where we have two crossing lines is the Pappus hexagon theorem, which goes 
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back to Pappus of Alexandria who lived around 320 AD. The Pappus hexagon theorem is 
one of the first known results in projective geometry. 


147. VITALI THEOREM 


A Lebesgue measure in Euclidean space R” is a Borel measure which is invariant under 
Euclidean transformations. It is the Haar measure of the locally compact group R” and unique 
if one normalizes is so that the unit cube has measure 1. In dimension n = 1, the Lebesgue 
measure of an interval [a,b] is b—a. In dimension n = 2, the Lebesgue measure of a measurable 
set is the area of the set. In particular, a ball of radius r has area mr?.. When constructing 
the measure one has to specify a o-algebra, which is in the Lebesgue case the Borel a-algebra 
generated by the open sets in R”. One has for every n > 1: 


Theorem: There exist sets in R” that are not Lebesgue measurable. 


The result is due to Giuseppe Vitali from 1905. It justifies why one has to go through all the 
trouble of building a o-algebra carefully and why it is not possible to work with the complete 
o-algebra of all subsets of IR” (which is called the discrete c-algebra). The proof of the Vitali 
theorem shows connections with the foundations of mathematics: by the axiom of choice 
there exists a set V which represents equivalence classes in T/Q, where T is the circle. For 
this Vitali set V, all translates V, = V +r are all disjoint withre« Q {r+V,rEeQkh=R 
and so form a partition. By the Lebesgue measure property, all translated sets V,. have the 
same measure. As they are a countable set and are disjoint and add up to a set of Lebesgue 
measure 1, they have to have measure zero. But this contradicts o-additivity. Now lift V to R 
and then build V x R”~'. More spectacular are decompositions of the unit ball into 5 disjoint 
sets which are equivalent under Euclidean transformations and which can be reassembled to 
get two disjoint unit balls. This is the Banach-Tarski construction from 1924. 


148. WILSON’S THEOREM 


The factorial n! of a number defined as n! = 1-2---n. For example, 5! = 120. 
Theorem: n > 1 is prime if and only if (n — 1)! + 1 is divisible by n. 


For n = 5 for example (5 — 1)!+ 1 = 25 is divisible by 5. For n = 6 we have (6—1)!4+1=121 
which is not divisible by 6. Indeed, 6 = 2 * 3 is not prime. The theorem is named after John 
Wilson, who was a student of Edward Waring. It seems that Joseph-Louis Lagrange gave the 
first proof in 1771. It is not a practical way to determine whether a number is prime: [625}: 
from a computational point of view, it is probably one of the world’s least efficient primality 
tests, since computing (n — 1)! takes so many steps. Also named after Wilson are the Wilson 
primes. These are primes for which not only p but p? divides (p — 1)! +1. The smallest one 
is 5. It is not known whether there are infinitely many. 


149. CARLESON THEOREM 


If f € L?(T), where T = R/(27Z) is the circle, then the Fourier transform L?(T) — 1?(Z) 
gives a Fourier series g(r) = >°,<7 cne"*", where c = (...,C_2,C-1, C0, C1, €2,---) € P(Z) is 
given by c, = (27)7! Lgoe dz. For smooth f, one knows g = f and Parseval’s identity 
Je f?(x) dx = >>), cj so that the Fourier transform extends to an unitary operator L?(T) > 
I?(Z). This does not say anything yet about the convergence of the sequence g,(x). We say the 
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Fourier series converges to f at a point z, if the sequence gn(x) = S>;__,, cne’”” converges to 
f(x) for n + co. We say, a sequence g,,(x) converges almost everywhere to f, if there exists a 
set Y C T of full Lebesgue measure ju(T) = 1 such that the series converges for all x € Y. (The 
Lebesgue measure is the normalized Haar measure dx/(27) on the circle). That the question 
can be subtle is illustrated by the result of Andrey Kolmogorov from 1923 to 1926, who gave 
examples of L'(T) functions for which the Fourier series diverges everywhere. 


Theorem: The Fourier series of a L? function converges almost everywhere. 


The statement had been conjectured by Nikolai Luzin in 1915 and was known as the Luzin 
conjecture. The theorem was proven by Lennart Carleson in 1966. An extension to L? with 
p € (1, co] was proven by Richard Hunt in 1968. The proof of the Carleson theorem is difficult. 
While mentioned in harmonic analysis texts like or surveys [380], who say about the 
Carleson-Hunt theorem that it is one of the deepest and least understood parts of the theory. 


150. INTERMEDIATE VALUE 


Let (X,O) be connected topological space and f : X — Ra continuous map. We say that 
f reaches both positive and negative signs if there exists a,b € X such that f(a) < 0 
and f(b) > 0. A root of f is a point x € X such that f(x) = 0. Let C(X) denote the set of 
continuous functions from X to R. This means that for f € C(X) and all open sets U in R, 
one has f~!(U) € O. 


Theorem: f € C(X) reaching both signs on a connected X has a root. 


The theorem was proven by Bernard Bolzano in 1817 for functions from the interval [a, b] to 
R. The proof follows from the definitions: as P = (0,00) is open, also f~!(P) is open. As 
N = (—o0,0) is open, also f~'(N) is open. If there is no root, then X = NUP isa disjoint union 
of two open sets and so disconnected. This contradicts the assumption of X being connected. 
A consequences is the wobbly table theorem: given a square table with parallel equal length 
legs and a “floor" given by the graph z = g(x,y) of a continuous g can be rotated and possibly 
translated in the z direction so that all 4 legs are on the table. The proof of this application 
is seen as a consequence of the intermediate value theorem applied to the height function f(¢) 
of the fourth leg if three other legs are on the floor. A consequence is also Rolle’s theorem, 
assuring that if a continuously differentiable function [a,b] + R with f(a) = f(b) has a point 
x € (a,b) with f’(x) = 0. Tilting Rolle gives the mean value theorem assuring that for a 
continuously differentiable function [a,b] > R, there exists x € (a,b) with f’(a) = f(b) — f(a). 
The general theorem shows that it is the connectedness and not the completeness of X which 
is the important assumption. 


151. PERRON-FROBENIUS 


A nxn matrix A is non-negative if A;; > 0 for all 7,7 and positive if A;; > 0 for all 7,7. 
The Perron-Frobenius theorem is: 


Theorem: A positive matrix has a unique largest eigenvalue. 


The theorem has been proven by Oskar Perron in 1907 and by Georg Frobenius in 1908 
233]. When seeing the map x — Az on the projective space, this is in suitable coordinates a 
contraction and the Banach fixed point theorem applies. This is the proof of Garret Birkhoff 
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who used the Hilbert metric [413]. The Brouwer fixed point theorem only gives existence, not 
uniqueness, but the Brouwer fixed point applies for non-negative matrices. This has applications 
in graph theory, Markov chains or Google page rank. The Google matrix is defined as 
G = dA+(1-—d)E, where d is a damping factor and A is a Markov matrix defined by the 
network and EF is the matrix EF; = 1. Sergey Brin and Larry Page write “the damping factor 
d is the probability at each page the random surfer will get bored and request another random 
page". The page rank equation is Gz = z. In other words, the Google Page rank vector (the 
one billion dollar vector), is a Perron-Frobenius eigenvector. It assigns page rank values to the 
individual nodes of the network. See [438]. For the linear algebra of non-negative matrices, see 


[494]. 


152. CONTINUUM HYPOTHESIS 


No is the cardinality of the natural numbers N. \, is the next larger cardinality. The cardi- 
nality of the real numbers R is 2*°. The statement 2°° = X, is the continuum hypothesis 
abbreviated CH. The Zermelo-Fraenkel axiom system ZFC of set theory is the most com- 
mon foundational axiomatic framework of mathematics. The letter C’ refers to the axiom of 
choice. 


Theorem: Neither 2° = &; nor 2%° ~&, can be proven in ZFC. 


This result combines a result of Kurt Goedel from 1938 (CH is consistent with ZFC) and 
Paul Cohen (Negated CH is independent of ZVC) from 1963 [133] [134]. Cantor had for a long 
time tried to prove that the continuum hypothesis holds. The Goedel-Cohen’s theorem shows 
that any such effort has been in vain and illustrates why Cantor was doomed not to succeed. 
The problem had then been the first of Hilbert’s problems of 1900. For more, see or 
who summarizes the result in words: Godel solved the substructure problem in 1938. Over 
25 years later Cohen, arguably the Galois of set theory, solved the extension problem. 


153. HOMOTOPY-HOMOLOGY 


Given a path connected pointed topological space X with base 6, the n’th homotopy group 
T,(X) is the set of equivalence classes of base preserving maps from the pointed sphere S” to 
X. It can be written as the set of homotopy classes of maps from the n-cube |0,1]" to X 
such that the boundary of [0, 1|" is mapped to b. It becomes a group by defining addition as 
(f+g) (ti, anne sta) = f (2t1, ta, fre tea) for 0 < ty < 1/2 and (f+g)(ti, aera tn) = g(2t,—-1, ta, ates stn) 
for 1/2 <t <1. In the case n = 1, this is “joining the trip": travel first along the first curve 
with twice the speed, then take the second curve. The groups 7, do not depend on the base 
point. As X is assumed to be connected, 7(X) is the trivial group. The group 7(X) is 
the fundamental group. It can be non-abelian. For n > 2, the groups 7,,(X) are always 
Abelian f +9 =g9+/f. The k’th homology group H,,(X) of a topological space X with 
integer coefficients is obtained from the chain complex of the free abelian group generated by 
continuous maps from n-dimensional simplices to X. The Hurewicz theorem is 


Theorem: There exists a homomorphism 7,,(X) + H,,(X). 


Higher homotopy groups were discovered by Witold Hurewitz during the years 1935-1936. The 
Hurewitz theorem itself has then been established in 1950 [337]. In the case n = 1, the 
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homomorphism can be easily described: if y : [0,1] + X is a path, then since [0,1] is a 1- 
simplex, the path is a singular l-simplex in X. As the boundary of y is empty, this singular 
1-simplex is a cycle. This allows to see it as an element in H,(X). If two paths are homotopic, 
then their corresponding singular simplices are equivalent in H,(X). There is an elegant proof 
using Hodge theory if X = M is a compact manifold: the image C' of a map 7,(M) can be 
interpreted as a Schwartz distribution on M. Let L = (d+ d*)? be the Hodge Laplacian and 
let the heat flow e~’” act on C. For t > 0, the image e~““C is now smooth and defines a 
differential form in A?(M). As all the non-zero eigenspaces get damped exponentially, the 
limit of the heat flow is a harmonic form, an eigenvector to the eigenvalue 0. But Hodge 
theory identifies ker(L|A”) with H?(M) and so with H,(M) by Poincaré duality. The Hurewitz 
homomorphism is then even constructive. “Just heat up the curve to get the corresponding 
cohomology element, the commutator group elements get melted away by the heat." A space 
X is called n-connected if 7;(X) = 0 for all i <n. So, 0-connected means path connected 
and 1-connected is simply connected. For n > 2, one has 7,,(X) isomorphic to H,,(X) if X 
is (n—1)-connected. In the case n = 1, this can already not be true as 7,(X) is in general non- 
commutative and H,(X) is but Hi(X) is the isomorphic to the abelianization of G = 7(X) 
which is the group obtained by factoring out the commutator subgroup [G, G] which is a normal 
subgroup of G and generated by all the commutators g~'h~'gh of group elements g,h of G. 


See 02]. 


154. PICk’S THEOREM 


Let P be a simple polygon in the plane R?. This means that it is given by as finite ordered 
set of points called vertices P; = (aj, y:) 1 = 0,...,n such that the line segments P;Pmoa(i+1,n) 
called edges joining neighboring points do not intersect. The polygon defines a polygonal 
region G with area A. Assume now that all coordinates x;, y; are integers. Let J be the number 
of lattice points (k,/) € Z? inside G and B the number of lattice points at the boundary of G. 
Pick’s theorem assures: 


Theorem: A=J+ B/2-1. 


The result was found in 1899 by Georg Pick [541]. For a triangle for example with no interior 
points, one has 0+ 3/2 —1 = 1/2, for a rectangle parallel to the coordinate axes with I = n*m 
interior points and B = 2n + 2m-+ 4 boundary points and area A = (n + 1)(m + 1) also 
I — B/2—1 = A. The theorem has become a popular school project assignment in early 
geometry courses as there are many ways to prove it. An example is to cut away a triangle and 
use induction on the area then verify that if two polygons are joined along a line segment, the 
functional J + B/2— 1 is additive. There are other explicit formulas for the area like Green’s 
formula A = ys LiYit1 — Li+1y;i which does not assume the vertices P; = (xj, y;) to be lattice 
points. 


155. ISOSPECTRAL DRUMS 


On a compact region G C R? with piecewise smooth boundary dG one can look at the Dirichlet 
problem —A/f = 0 in the interior of G and f = 0 on dG. The region is considered a “drum". If 
hit, one hears the spectrum of the Laplacian Au = uz, + Uyy. There is a sequence of Dirichlet 
eigenvalues 0 = Ao < Ay < Ag <---, real values which solve —Au, = A,Un for some functions 
Un Which are zero on the boundary. For example, if G is the square [0,7] x [0,7], then the 
eigenvalues are n? + m? with eigenvectors sin(nz)sin(mz). The eigenvalue 0 belongs to the 
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constant eigenfunction. Two drums are called isospectral, if they have the same eigenvalues. 
Two drums are non-isometric, if there is no transformation generated by rotations, translation 
and reflections which maps one drum to the other. 


Theorem: There exist non-isometric but isospectral drums. 


Mark Kac had asked in 1962 “Can one hear the sound of a drum" ". Caroline Gordon, 
David Webb and Scott Wolpert answered this question negatively [259]. In the convex case, 
the question is still open. 


156. BERTRAND THEOREM 


The path r(t) of a particle in R” moving in a central force potential V(x) = f (|x|) experi- 
ences the central force F = —VV(x) = —f'(|z|)z/|z|. In the case of the Newton potential 
V(«) = —GMm/|z|, where the central mass M, the body mass m as well as the gravitational 
constant G determines the force F(x) = —rGMm/|z|°. The motion of the particle follows the 
differential equations r”(t) = —MGr(t)/|r|*, which conserve the energy E(r) = mr?/2+V/(r) 
and angular momentum L = mr Ar’, a n(n + 1)/2 dimensional quantity. The invariance 
of L assures that r(t) stays in the plane initially spanned by r(0) and r’(0) and that the area 
of the parallelogram spanned by r(t) and r’(t) is constant. To see the natural potential in R” 
is, one has to go beyond Newton and pass to Gauss, who wrote the gravitational law in the 
form div(F’) = 47p, where p is the mass density. It expresses that mass is the source for the 
force field F'. To get the force field in a central symmetric mass distribution, one can use the 
divergence theorem in R” and relate the integral of 4p over a ball of radius r with the flux 
of F through the sphere S(r) of radius r. The former is 41M, where M is the total mass in 
the ball, the later is —|S(r)|F'(r), where |S(r)| is the surface area of the sphere and the nega- 
tive sign is because for an attractive force F'(r) points inside. So, in three dimensions, Gauss 
recovers the Newton gravitational law F(r) = —4%GM/|S(r)| = —GM/|r|?. There is a natural 
central force Kepler problem in any dimensions: in R", we have F'(r) = —C,r/|r|" where C, 
is a constant. For n = 1, there is a constant force pulling the particle towards the center, for 
n = 2, one has a 1/|r| force which corresponds to a logarithmic potential, for n = 3, it is the 
Newtonian inverse square 1/r? force, in n = 4, it is a 1/r? force. For n = 0, one formally gets 
the harmonic oscillator which is Hook’s law. Which potentials lead to periodic motion? 
The answer is surprising and was given by Bertrand: only the harmonic oscillator potential and 
the Newtonian potential in R*® work. Let us call a central force potential all periodic if every 
bounded (position and velocity) solution r(t) of the differential equations is periodic. Already 
for the Kepler problem, there are not only motions on ellipses but also scattering solutions 
moving on parabola or hyperbola, or then suicide motions, with r’(0) = 0, where the particle 
dives into the singularity. 


Theorem: Only the Newton potentials for n = —1 and n = 3 are all periodic. 


This theorem of Joseph Bertrand from 1873 tells that three dimensional space is special as it in 
any other dimension, calendars would be almost periodic as the solutions to the Kepler problem 
would not close up. We could live with that but there are more compelling reasons why n = 3 
is dynamically better: in other dimensions, only very special orbits stay bounded. A small 
perturbation leads to the planet colliding with the sun or escaping to infinity. Gauss’s analysis 
allows also to compute the force F'(r) in distance r to the center of a n-dimensional ball with 
constant mass density. The divergence theorem gives 47p|B(r)| = —|S(r)|F(r), where |B(r)| 
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is the volume of the solid ball of radius r and |S(r)| the surface area of the sphere. This gives 
the Hook law force F(x) = —4zpa/n, where n is the dimension. 


157. CATASTROPHE THEORY 


Catastrophe theory describes the singularity structure of smooth functions f on a n-manifold MW 
parametrized by some r parameters. A basic assumption is that configurations of interest 
of the functional f are critical points of f : M — R. Especially interesting are minima, stable 
configurations. When changing parameters of f, bifurcations, structural changes of the critical 
set can happen. Especially, minima can change their nature or disappear. In particular, the 
function f;(a,), where 2; is a local minimum can change discontinuously, even if the function 
(t,x) + f:(x) is smooth. Such discontinuous changes are called catastrophes. The stage for 
Thom’s theorem is a smooth function f : R” — R". One can think of f as a r parameter family 
of functions on space R". Let V,, = (0,,,...0z,) is the gradient operator with respect to 
the space variables and My = {(z,y) € R" x R" | V,f = 0} is the submanifold on which points 
are critical. The space X = C'™(R” x R”) of smooth functions in space and parameter can be 
equipped with the Whitney topology, the topology generated by a basis which is the union 
of all the basis sets of C* Whitney topologies. A basis for the later is the set of all functions for 
which f(x,y) € U; for all 0 < j < k and Uo,--+U, are all open intervals. With the Whitney 
C™ topology, X is a Baire space so that residual sets (countable intersections of open dense 
sets), are dense. The next theorem works n = 2,r <6 and forn>3ifr<5 


Theorem: For a residual set in X, My, is an r-dimensional manifold. 


The theorem was due to René Thom who initiated catastrophe theory in a 1966 article and 
wrote building on previous work by Hassler Whitney. More work and proofs were done 
by various mathematicians like John Mather or Bernard Malgrange. There is more to it: the 
restriction X,y of the projection of the singularity set My onto the parameter space R” can be 
classified. Thom proved that for r = 4, there are exactly seven elementary catastrophes: 
‘fold", “cusp", “swallowtail", “butterfly", “hyperbolic umbillic", “elliptic umbillic" and “parabolic 
umbillic". For r = 5, the number of catastrophe types is 11. The subject is part of singularity 
theory of differentiable maps, a theory that started by Hassler Whitney in 1955. The theory 
of bifurcations was developed by Henri Poincaré and Alexander Andronov. See also 
[668]. It is also widely studied in the context of dynamical systems [496]. 


158. PHASE TRANSITION 


Given a finite simple graph G = (V,£), an interaction function J : E — R and a scalar 
field h : V — R defines a Hamiltonian H(o) = 30; em Jij7ioj — MI jey hjoj On the set 
of all functions 0 : V — {-1,1}. The interpretation is that o; are spin values, h; an 
external magnetic field and J;; is an interaction function. The additional parameter 
ju is a magnetic moment. The energy H defines a probability measure P on the set Q = 
{—1,1}" of all spin configurations. It is the Gibbs-Boltzmann distribution P[{o}] = e~4 /Z, 
where Z is is the normalization constant rendering P a probability measure. One calls Z the 
partition function (as it is usually considered to be a function of some of the parameters 
like temperature). Given a random variable =observable X : Q — R, one is interested in the 
expectation ELX]. An example is X (a7) = o;0;, which leads to the correlation. When replacing 
H with 6H, where 6 = 1/(KT) is an inverse temperature parameter (T' is the temperature 
and K the Bolzmann constant), one can study the expectation of a random variable X in 
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dependence of 3. One writes now also E[|X] = (X), to stress the dependence on {. In the case 
when G is a d-dimensional lattice G = [—L, L]“, where two lattice points x,y are connected 
if 3>, |a% — yx| = 1 one look at the L + oo van Hove limit, where G = Z‘. In the case 
J =1,h=0 this is the Ising model. As J is positive, this is a Ferromagnetic situation. A 
parameter value, where a quantity like Zg or a derivative of it changes discontinuously is called 
a phase transition. 


Theorem: The Ising model in two dimensions has a phase transition. 


This was first proven by Lars Onsager in 1944, who in a tour de force gave analytical solutions. 
The analysis shows that there is a phase transition. The temperature T at which this happens 
is called the Curie temperature. The one dimensional case had been solved by Ernst Ising 
in 1925, who got it as a PhD project from his adviser Wilhelm Lenz. In one dimensions, there 
is no phase transition. In three and higher dimensions, there are no analytical solutions. The 
Ising model is only one of many models and generalizations. If the Jj; are random one deals 
with disordered systems. An example is the Edwards-Anderson model, where J;; are 
Gaussian random variables. This is an example of a spin glass model. An other example is 
the Sherrington-Kirkpatrick model from 1975, where the lattice is replaced by a complete 
graph and the J;; define a random matrix. An other possibility is to change the spin to Z,, 
or the symmetric group (Potts) or then some other Lie group (Lattice gauge fields) and then 
use a character to get a numerical value. Or one replaces the zero-dimensional sphere Z with 
a higher dimensional sphere like S? and takes 0; - 0; (Heisenberg model). See [604]. 


159. CEVA THEOREM 


Given a triangle ABC in the Euclidean plane R? and a point O in the interior. For any 
choice of points A’ on the segment BC, any point B’ on the segment AC and any point C’ on 
the segment AB, one can look at the ratios r(AB) = AC’/C’B and r(BC) = BA'/A'C and 
r(CA) = CB'/B’A in which the points bisect the sides of the triangle. The Ceva theorem is 


Theorem: r(AB)r(BC)r(CA) = 1 


The theorem is called after Giovanni Ceva who wrote it down in 1678. The result is older 
however: Al-Mu’taman ibn Hud from Zaragoza proved it already in the 11’th century. [820]. 


See [576]. 


160. ANGLE THEOREM 


Given a circle C in the plane R?. Denote by M its center point. Pick two points A, B on C. 
If P is a point on C, then APB is constant for all P in C which are on the same side than MW 
with respect to the segment AB. The angle APB is called the inscribed angle of the secant 
AB. The next theorem is also called the inscribed angle theorem. 


Theorem: The angle APB is half the angle AMB. 


The theorem is believed to have been known already to Thales of Miletus who is the first Greek 
mathematician known by name (624 - 546 BC). It is usually called Thales theorem in the 
special case is if A, B are on a diagonal. Then the angle APB is a right angle. A consequence 
of the theorem is that the opposite angles of a quadrilateral which is inscribed in a circle add 
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up to 7. Unlike the special case of the right angle which immediately follows from symmetry, 
the full version of Thales theorem can surprise at first. 


161. TOTAL CURVATURE 


A smooth simple closed curve C in R® is called a knot. If r(t) is the parametrization of 
C, then «(t) = |r’(t) x r”(t)|/|r’(t)|? is called the curvature of the parametrization of r at 


the point r(t). The integral K(C) = [" «(t) dt is the total curvature of r. We say C is 


unknotted if C can be continuously deformed toa cirde S ={r ty =12=0} = {n(0)= 
(cos(t), sin(t),0),¢ € [0,27]} meaning that there exists a smooth function R(t,s) such that 
R(t,0) = r(t) and R(t, 1) = ri(t) such that for any s, the curve C;: t > R(t,s) is a simple 
closed curve. 


Theorem: If Cis a knot and K(C) < 47, then K is unknotted. 


This is the theorem of Fary-Milnor, proven by Fary in 1949 and Milnor in 1950. The theorem 
follows also from the existence of quadrisecants, which are lines intersecting the knot in 4 
points [163]. The existence of quadrisecants was proven by Erika Pannwitz in 1933 for smooth 
knots and generalized in 1997 by Greg Kuperberg to tame knots, knots which are equivalent 
to polygonal knots. 


162. MORLEY’S THEOREM 


An angle trisector of an angle a = Z(C'AB) in R? is a pair of lines PA, QA through A such 
that the angles Z7(CAP), Z(PAQ), Z(QAB) are all equal. Given a triangle ABC, we can look 
at the angle trisectors at each point and intersect the adjacent trisectors, leading to a triangle 
PQR inside the triangle. The triangle PQR is called the Morley triangle of ABC’. Morley’s 
theorem is 


Theorem: For any triangle ABC, the Morley triangle is equilateral. 


Morley’s theorem was discovered in 1899 by Frank Morley. A short proof was given in 1995 
by John H. Conway: assume the triangle ABC had angles 3a, 3,37 so that a+6+y=7/3. 
Start with an equilateral triangle PQR of length 1. Build three triangles PQA with angles 
6+1/3,0a,y + 7/3, QCA with angles a + 1/3,7,8 + 7/3 and a triangle RBQ with angles 
y+ /3, 8,a+7/3. Then fill in three other triangles ACQ, CBR, BAP with angles a, y, G+27/3 
and y, 0,a+27/3 6,a,y+27/3. These triangles fits together to a triangle of the shape ABC. 
See [209]. 


163. RISING SUN LEMMA 


Given an interval [a,b], the space C(([a, b]) denotes the vector space of all continuous functions 
on [a,b]. For g € C({a,b]), we say the set E(g) = {x © (a,b) | g(t) > g(x) for t > x} has 
the rising sun property if E is open, and FE is empty if and only if g is decreasing and 
if not empty, then FE can be written as E = U,,(an,bn) with pairwise disjoint intervals with 


g(an) < g(bp). See [50]. 


Theorem: f € C({a,b], R) has the rising sun property. 
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The theorem is due to F. Riesz. The name “rising sun lemma" appeared according to first 
in [27]. The picture is to draw the graph of the function f. If light comes from a distant source 
parallel to the x-axis, then the intervals (a, b,) delimit the hollows that remain in the shade 
at the moment of sunrise. The lemma is used in real analysis to prove that every monotone 
non-decreasing function is almost everywhere differentiable. 


164. UNIFORM CONTINUITY 


Uniform continuity is a stronger version of continuity. But unlike continuity, which is defined for 
maps between topological spaces, uniform continuity needs more structure like a metric spaces 
or more generally a topological space with a uniform structure. Given two metric spaces X 
and Y, a function f : X — Y is called continuous if f~'(U) is open for every open U in Y. A 
function f is called uniformly continuous if there exists a sequence of numbers M,, + 0 such 
that for every positive n € N, the condition d(x, y) < 1/n implies that d( f(x), f(y)) < Mn. 


Theorem: For compact X, continuous implies uniformly continuous. 


The theorem is due to Eduard Heine and Georg Cantor. Heine is known also for the Heine- 
Borel theorem which states that in Euclidean spaces, the class of closed and bounded sets 
agrees with the class of compact sets. The proof of the Heine-Cantor theorem uses the 
extreme value theorem assuring that a continuous function on a compact space X achieves 
a maximum. Look for every n and every x at the minimal M,,(a) such that if |a — y| < 1/n, 
then |f(x) — f(y)| < M(x). Now M,,(a) is non-negative and finite and depends continuously 
on x. By the extremal value theorem there is a maximum. We call it M,. This assures now 
that if ja —y| < 1/n, then |f(x) — f(y)| < Mn. The Bolzano-Weierstrass or sequential 
compactness theorem assures that a bounded sequence in R” has a convergent subsequence. 
This is used in the intermediate value theorem assuring that if f(a) < 0 and f(b) > 0, then 
there is an x with f(x) = 0. The Heine-Cantor theorem together with the intermediate value 
theorem assures that continuous functions are Riemann integrable. The additional uniform 
structure or metric structure is also necessary when defining completeness in the sense that 
every Cauchy sequence converges. Completeness is not a property of topological spaces: (0, 1) 
is not complete but R is complete even so the two spaces are homeomorphic. 


165. JORDAN NORMAL FORM 


A nxn matrix A is similar to an other n x n matrix B if there exists an invertible n x n 
matrix S such that B = S~'AS. A matrix is in Jordan normal form (also called Jordan 
canonical form) if it is block diagonal, where each block is a Jordan block. A m x m matrix 


J is a Jordan block, if Je; = re,, and Je, = Aex + e441 for k = 2,...,m. An example of a 
ee | 

3 x 3 Jordan block matrix is J= | 0 3 1 |. In other words, A is of the form A= A1+ N, 
OO 3 


where N is nilpotent: N” = 0 and more precisely only has 1 in the super diagonal above the 
diagonal. 


Theorem: Every n x n matrix is similar to a matrix in Jordan normal form. 


Up to the order of the Jordan blocks, the Jordan normal form is unique. If each Jordan block 
is a 1 x 1 matrix, then the matrix is called diagonalizable. The spectral theorem assures 
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that a normal matrix AA* = A*A is diagonalizable. Not every matrix is diagonalizable as the 
shear matrix A = i : | , a2 2 Jordan block, shows. The theorem has been stated first 
by Camille Jordan in 1870. For history, see [83]. The Jordan-Chevalley generalization states 
that over an arbitrary perfect field, a matrix is similar to B+ N, where B is semi-simple and 
N is nilpotent and BN = NB. (See page 17). A matrix B is called semi-simple if every 
B-invariant linear subspace V has a complementary B-invariant subspace. For algebraically 
closed fields, semi-simple is equivalent to be conjugated to a diagonal matrix. To the condition 
on the field: a field k is called perfect if every irreducible polynomial over k& has distinct roots. 


166. HIPPOCRATES THEOREM 


The Hippocrates theorem dealing with the lunes of Hippocrates or the lunes of Alhazen 
is a theorem in planar geometry: given a triangle ABC in R? with right angle 6 at B, one can 
draw the circles with diameter AC, AB and BC centered at the midpoints (A+ C)/2, (A+B) /2 
and (B + C)/2. They define two “moon-shaped" regions U,V bounded by circles called the 
lunes. 


Theorem: The area of U plus the area of V is the area of the triangle. 


The proof directly follows from Pythagoras by relating the areas of half discs and triangle. The 
result is remarkable as it was historically the first attempt for the quadrature of the circle. 
The lunes are bound by circles, while the triangle is bound by line segments. The theorem 
does the quadrature of the lunes. Hippocrates of Chios lived from about 470 to 410 BC. 
For history see [37] page 37. 


167. FERMAT-HAMILTON PRINCIPLE 


A point x is called a critical point of a differentiable function f : R™ > R, if Vf(x) = 0, 
where V f is the gradient of f. A point 29 is called a local maximum of f if there exists r > 0 
such that f(x) < f(x) for all |a — xo| < r. The local maximum does not have to be isolated. 
For a constant function for example, every point is a local maximum. The local maximum also 
does not have to be a global maximum. The function f(x) = x* — 2? has a local maximum 
at x =0 but this is not a global maximum because f(2) > f(0). 


Theorem: If zo is a local maximum of f, then Vf (xo) = 0. 


This generalizes to the calculus of variations, where Vf is replaced by the variation. In 
the case when f(x) = f? Lied), 20) dt is a function on the space of curves [a,b] > R” (one 
calls this then a functional or action functional) then we an look at the problem to minimize 
the action. In that case, the gradient is 69 = L,(2(t),2’(t)) — £L./(z(t), 2'(t)) = 0. This so 
called Hamilton principle can be seen as a generalization of the Fermat principle to infinite 
dimensions. The equations 6S = 0 are called the Euler-Lagrange equations or Lagrange 
equations of the second kind. They are the starting point of Lagrangian mechanics. 
Fermat’s original paper deals with the single variable situation but the higher dimensional 
situation is similar. Fermat in some sense already looked at the action principle which is the 
situation to minimize the arc length of a path in a medium with two different properties like 
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water and air. In that case the shortest path is described by the Fermat law or Fermat’s 
principle. 
168. ALTERNATING SIGN 


An alternating sign matrix is a square matrix with entries in {0,1,—1} such that the sum 
of each row and column is 1 and the nonzero entries in each row and column alternate in sign. 


n—1 (3k+))! 
k=0 (n+k)! ° 


Theorem: The number of n x n alternating sign matrices is [| [ 


The numbers Tp2o(8k+ 1)!/(n+k)! are known as the Robbins numbers or Andrews-Mills- 
Robbins-Rumsey numbers and are the integer sequence A005130 [I]. The alternating sign 
conjecture was popularized by David Robbins in [563]. The theorem was proven by Doron 
Zeilberger in 1994 [717]. A short proof was given by Greg Kuperberg in 1996 [428]. A book 
about it is [89]. 


169. COMBINATORIAL CONVEXITY 


A finite set P of points in R¢ is called r-convex, if there is a partition of P into r sets such that 
their convex hulls intersect simultaneously in a non-empty set. Tverberg’s theorem states: 


Theorem: A set of (r — 1)(d+1) +1 points in R¢ is r-convex. 


The decomposition of P into r subsets is called the Tverberg partition. In the one-dimensional 
case d = 1, the theorem assures that 2r —1 points on the line are r-convex. For r = 3 for exam- 
ple, this means that 5 points are 3-convex. If the points are arranged x1 < %2 < 43 < %4 < 4s, 
the Tverberg partition {71,74}, {2,25}, {v3}}. For r = 2, it implies Radon’s theorem which 
tells that d+2 points in R? can be partitioned into 2 sets whose convex hulls intersect. For ex- 
ample, 4 points {x1, 22,73, 74} in R? can be partitioned into two sets such that their convex hull 
intersect. Indeed, the 4 points define a quadrilateral and the partition {{21, 73}, {r2, 74}} de- 
fine the two diagonals of the quadrilateral. The theorem has been proven by Helge Tverberg 
in 1966. See [47]. 


170. THE UMLAUFSATZ 


Let r be a continuously differentiable closed curve in R?. If r(t) is a parametrization for which 
the speed is 1, we have r/(t) = (cos(a(t)),sin(a(t))) and a signed curvature «(t) = a’(t). 
If [0,27] is the parameter interval, then K = i K(t) dt is the total curvature. The Hopf 
Umlaufsatz is: 


Theorem: For r € C', the total curvature of a plane curve is 27. 


The paper was proven in 1935 by Heinz Hopf using a homotopy proof: define f(s,t) as 
the argument of the line through r(s) and r(t) or continuously extend it s = t as the argument 
of the tangent line. The direct line from (0,0) to (1,1) in the parameter st-plane gives a total 
angle change of n2a where n is an integer. Now deform the curve from (0,0) to (1,1) so that 
it first goes straight from (0,0) to (0,1), then straight from (0,1) to (1,1). Both lines produce 
a deformation of 7 and show that n = 1. The theorem can be generalized to a Gauss-Bonnet 
theorem for planar regions G. The total curvature of the boundary is 27 times the Euler 
characteristic of G. For a discrete version, see [397]. 
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171. FROBENIUS DETERMINANT 


The Frobenius determinant theorem tells how the determinant of the “multiplication table 
matrix" factors into irreducible polynomials: if G = {g1,...,gn} is a finite group and 7; is a 
variable associated to the group element g;, then the matrix Ajj = 7,9, satisfies 


Theorem: det(A) = es Career 


Here, d; = deg(p;) and r is the number of conjugacy classes of G. For an Abelian group G, 
there are n conjugacy classes. The theorem had been conjectured in 1896 by Richard Dedekind. 


Frobenius proved it. See [306] [138]. 


172. KONIG’S THEOREM 


. A matching M in a finite simple graph G = (V, £) with vertex set V and edge set E 
is a subset M of the edges FE in which no two edges have a common vertex. A vertex cover 
C is a set of vertices such that U,,-¢ 5(@) = V, where S(x) is the unit sphere of a vertex a. 
A bipartite graph is a graph for which V = V; U V2 can be partitioned into two disjoint sets 
V,, V2 such that all edges connect vertices from different sets. K6nig’s theorem, from 1931, also 
known as K6nig-Egevary theorem is: 


Theorem: For bipartite G, matching number = vertex cover number. 


The vertex cover problem is the problem to find the vertex cover number is a classical NP- 
complete problem. For example, for a cyclic graph G = Cio with 2n vertices {1, 2,3,--- ,2n} 
(which is an example of a bipartite graph), the set C = {2,4,---2n} is a minimal vertex cover. 
The edges M = {(1, 2), (8,4),---(2n — 1,2n)} are a maximal matching. The example of an 
odd cyclic graph like Cy (which is not bipartite) already shows that the bipartite condition is 
necessary: for Co, the set {1,3,5, 7,8} is a minimal cover and M = {(1, 2), (3, 4), (5,6), (7,8)} 
is a maximal matching. 


The origin of the theorem is attributed to Dénes Kiiig, who proved it in 1931 and wrote a 
precursor paper in 1916, where he proved that a regular (constant vertex degree) bipartite 
graph has a perfect matching (a matching which covers all vertices). For a proof, see [168 
(Chapter 2). 


173. POLYNOMIAL ERGODIC THEOREMS 


Birkhoff’s ergodic theorem stating that S,, ¢(a) = >a f(T*x) converges for n + 00 point- 
wise for 4 almost every x for an automorphism T of a probability space (X, A, 4) and a function 
f € L?(X) with 1 < p < oo has been generalized in 1988 by Jean Bourgain to polynomial 
averages Sp, ;(x) = oe f(T?“x), where P is a polynomial with integer coefficients. 


Theorem: Sp, s(x) converges point-wise almost everywhere if p > 1. 


Bourgain proves in first a maximal ergodic theorem and extends it also to Z? actions 
generated by d commuting transformations. The starting point is that for f € L?(X, y), there is 
for any integer t a bound |Sp4.n,f|2 < C|f|2. This implies for example that + i f(z+mia) > 
f f(x) dx for any irrational a and any bounded measurable function. The case t = 2 leads 
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to results to sums like + 77" seo which relates to Gauss sums ti a) =+ ae er ae 


One can for example estimate 7) 5€ PRE C(n//¢ + Vnlog(q) + Jago . [78]. The 
case p = 1 is known to fail [96]. The results have been generalized to Bee ee 
like Sppa(z) = + ae (T°x)g(T’"x) for integers a,b where f € L?,g € L? with 1 < 
p,q < co and 1/p4+1/q < 3/2 and to non-conventional bilinear polynomial averages 
15 (T"x)gT?™) [33], where P is an integer polynomial of degree d > 2 and f € L?,g € 
[4 with 1 < p,q <co and 1/p+1/q <1. 


174. WANTZEL’S THEOREM ON ANGLE TRISECTION 


A classical problem in geometry asks to trisect an angle using an unmarked straightedge 
(ruler) and compass only. The insistence on restricting constructions to ruler and compass 
has been proposed already by Euclid and Archimedes already knew how to solve the problem 
using a marked straightedge meaning that one has additionally to the constructed points also 
an additional real number to work with. One can trisect and angle using an additional curve like 
an Archimedean spiral given in polar coordinates as r = @. In that case, the trisecting 
the radius r = ,/x?+ y? of a given point (x,y) = (rcos(@),rsin(@)) gives the angle 6/3 by 
intersecting the circle of radius r/3 with the spiral r = 6. More generally, a curve which can 
be used to trisect an angle is called a trisectrix. 


Theorem: One can not trisect a general angle with ruler and compass. 


The theorem follows from Galois theory. An angle a can be trisected if and only if the poly- 
nomial 52? — 3x — cos(a) is reducible over the field Q(cos(a)). The angle a = 60° = 7/3 
for example is not trisectable. The first proof of the impossibility of trisecting an arbitrary 
angle was given by Pierre Wantzel in 1837. Wantzel also solved there the problem of doubling 
the cube and characterized constructable regular n-gons as the ones with n = 2"p, --- pz 
with distinct Fermat primes p;, = 27°" +1. Bieberbach realized in 1932 that every cubic 
construction can be traced back to the trisection of an angle and the extraction of the third 
root. [59]. This has been formulated more precisely by Gleason in 1988 [250] who states in 
in that article as Theorem 1: a real cubic equation can be solved geometrically using ruler 
and compass and angle-trisector if and only if its roots are all real. Gleason shows from this 
also that a regular n-gon can be constructed by ruler, compass and angle-trisector if and 
only if the prime factorization of n has the form 2"3*°p p2---p, with k > 0, where all primes 
Dr > 3 are distinct and have the form 23" +1. An example is p = 13 = 2734+ 1. The corre- 
sponding 13-gon is called the triskaidecagon for which Gleason gives a concrete construction 
using that 2cos(27k/13) are the roots of the polynomial x° + 2° — 52+ — 4x° + 62? + 3x2 — 1 
which factors over Q(V/13) because with 4 = (1 — V13)/2, \ = (1+ V13)/2 one can write it as 
(23 —2 —1+X(x? —1))(23 — x —1— (x? —1)), where the first factor has the root 2 cos(27/13). 
For more on angle trisector and especially many failed attempts, see [185]. 


175. PREISSMANN’S THEOREM 


Let M_ denote the class of compact negatively curved Riemannian manifolds WW. Negative 
curvature means that all sectional curvatures of M are negative everywhere. Let 7(/) denote 
the fundamental group of M. For positively curved manifolds, the theorem of Synge shows 
that the fundamental group 7(M) is finite; it can be trivial like for a sphere S7,d > 1 or be a 
finite group like 7(M) = Z» for the projective space M = RP“ for d > 1. For a flat manifold 
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like the torus T?, the fundamental group can already be the infinite group Z%. This changes for 
negative curvature. Preissmann showed that if 7(/) is cyclic, then there is only one closed 
geodesic and that there is maximally one geodesic in each homotopy class of closed curves in 
M. Here is Preissmann’s theorem which deals with non-trivial subgroups G of 7,(//) meaning 
that G should not be the trivial 1-point group. 


Theorem: If GC 7(M) for M € M_ is Abelian then M = Z. 


A consequence is that the torus T” can not admit a Riemannian metric of negative sectional 
curvature. Preissmann gives in his paper also the corollary that the product of two negatively 
curved Riemannian manifolds can not carry a metric with negative curvature. An analogue 
result for positive curvature is not known. The famous product conjecture of Heinz Hopf asks 
whether the product manifold S* x $? can carry a metric of positive curvature (see [708}). 
Preissmann who was born at Neuchatel in Switzerland in 1916, went to school at La Chaux-de- 
Fonds. He studied mathematics from 1934 to 1938 at the ETH and worked there until 1942 as an 
assistant to Kollros and Gonseth, writing his thesis under the guidance of Heinz Hopf, where the 
theorem appears [550]. Preissmann later later got interested in hydraulic computations given 
the Swiss boom of hydro-power developments. After having been an actuary in a life insurance 
from 1942-1946, he joined VAWD until 1958, then led the Department of Mathematical Methods 
of the hydraulics laboratory SOGREAH in Grenoble from 1958-1972, retiring in 1981. See [154]. 


176. KILLING-HOPF THEOREM 


A space form JM is a quotient A/G, where A is asphere, an Euclidean space or hyperbolic space 
and G is a group acting freely (gx = x is only possible for g=1) and discontinuously. The later 
means that for any compact AK in M, and any g € G the set gk 1 K is finite). A Riemannian 
manifold has constant curvature if all sectional curvatures are the same everywhere. The 
Killing-Hopf theorem is: 


Theorem: Constant curvature manifolds are space forms. 


The theorem is due to Wilhelm Killing from 1891 [382] and Heinz Hopf 1926 [330]. See 


for the topic of constant curvature manifolds. 


177. BALLOT THEOREM 


. Let X; be independent identically distributed random variables taking values e, = (0,--- ,0, 
1, 0,---0) in Z¢ with probability p,. If py > --- > pg, we can look at the multi-dimensional 
random walk S,, = 5>;_, X,. What is the probability that the walk starting at 0 remains in 
open cone Q = {x, > £2--+ > xq} at all positive times? The answer is given by the Ballot 
theorem. It expresses the probability as a van der Monde determinant: 


Theorem: P[S, € Q,Vn > 0] = Lic; (Pi =): 


The case d = 2 is the classical result is due to Joseph Bertrand and appears in virtually 
every probability textbook like [219] who also points out that the theorem has been proven 
earlier by William Whitworth who looked at the problem in a different context like the 
problem of counting the number of weak orderings. The historical context is voting and explains 
the etymology of the theorem [7]: if candidate A gets m votes and candidate B gets n votes, 
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then the probability that during the counting process A always has more votes than B is 
(n—m)/(n+m). If Pram counts the number of paths always favorable for A, then the recursion 
Patna = Prtim + Pamir holds. As Bmomial cocticients 6,,,, and st Dam = Bam = Pam 
satisfies the same recursion, it can be shown by induction that D,,m = 2MBy /(n+m), leading 
to the result. The multidimensional result has been studied in [715] [246]. 


178. POINCARE-HOPF 


If F be asmooth vector field on a compact n-manifold M with finitely many equilibrium points 
F(x) = 0. The index ip(x) of F at such an equilibrium point x; is defined as the degree 
of the map u € S(x) > F(u)/|F(u)| € S"“!, where S(x) is the boundary of a small enough 
ball containing xz in the interior. Let y(W) denote the Euler characteristic of WM. The 
Poincaré-Hopf index formula links the topological quantity .(Z) with the analytic index 
sum: 


Theorem: 7, r()=-0'F() = x(M) 


The formula can be used to compute the Euler characteristic of a manifold M: just construct 
a smooth vector field F' with finitely many equilibrium points and add up their indices. For 
example, on the n-torus M = T”, there is the constant vector field F(x) = v without equilibrium 
points. Therefore y(M) = 0. On a 2n-sphere embedded as {|x| = 1} in R?”"*! there are circles 
in SO(2n + 1,R) that have two fixed points of index 1, the Euler characteristic is 2. On a 
2n+1 sphere M, there are circles in SO(2n + 2,R) without fixed points so that y(/) = 0. 
A special case is if f is a Morse function on M, where F = Vf, the equilibrium points of F 
are the critical points of f. In that case ip(a) = (—1)™®), where m(z) is the Morse index, the 
number of negative eigenvalues of the Hessian d? f(x). Poincaré wrote the first article in 1885 


[545]. Then appeared Hopf’s articles for hypersurfaces and for vector fields. 


179. SAMPLING THEOREM 


. Let S' be the Schwartz space of complex-valued functions in C™°(R,C) such that ||f]|mn = 


supper [2 f™(x)| < oo. The Fourier transform f of f € S is defined as f(k) = Wor 


Je f(x)e** dx. The Nyquist-Shannon sampling theorem tells that if f supported on 
[—a,7]. Then {f(n),n € Z} determines f: 

Theorem: f(t) = +0 _,, f(n)sine(a(n — t)) 
It uses the sine function sinc(x) = sin(«)/az. The explicit reconstruction formula is also known 
as the Whittaker-Shannon interpolation formula as the formula appeared in the book ; 
Whitaker has already found that formula in 1915 while [598] which is the start of information 
theory. The result was also spearheaded by Nyquist 1928. We followed [629]. 


180. PETER WEYL THEOREM 


Let G be a compact topological group and let C(G, C) denote the Banach space of continuous 
complex-valued functions on G, equipped with the uniform norm | f|,. = maxzeq f(x). Denote 
by 7: G > GI(V) a group representation of G, where V is a complex vector space. This 
means 7(gh) = m(g)m(h) for any g,h € G. A matrix coefficient of G isa map ¢:G—>C 
which has the form L(7()), where 7 is a representation of G and where L is a linear functional 
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GI(V) > C. An example of a linear functional on GI(V) is the trace tr(A) or an other linear 
combination of matrix entries A;; (explaining the name). The Peter-Wey] theorem is: 


Theorem: The set of matrix coefficients is dense in C(G,C). 


This implies that the matrix coefficients are also dense in the Hilbert space L?(G, 1) defined by 
the Haar measure yu on G. If 7 is a unitary representation on a Hilbert space H = (V, (-,-)), one 
can write 7 as a direct sum of irreducible unitary representations and the matrix elements give 
an explicit orthornormal basis in L?(G): make a list of representatives of the isomorphism 
classes 7 of irreducible unitary representations of G’, then take the basis elements \/d(7)1(qg)i;, 
where d(7) is the degree of the representation. The theorem was proven by Fritz Peter and 
Herman Weyl in 1927 [537]. The result follows from the Stone-Weierstrass theorem if G is a 
matrix group and especially for Lie groups which are known to be matrix groups. Not much 
seems to be known about Fritz Peter (1899-1949) whose residence is in the paper [537] given 
as Karlsruhe and to whom Wey] refers as “his student". The book states that Peter got 
a doctorate in Géttingen in 1923 (with the title: Uber Brechungsindizes und Absorptionskon- 
stanten des Diamanten zwischen » 644 und 266), under the guidance of Max Born [587]. A 
conference proceeding lists him later as a teacher at a school in Schloss Salem near Uberlingen 


in Germany. See [305] 


181. KRUSKAL-KATONA THEOREM 


. A finite abstract simplicial complex G is a finite set of non-empty sets which is closed 
under the operation of taking finite non-empty subsets. The dimension of a set z is the |x|—1, 
where || is the cardinality of c € G. The f-vector f = (fo, fi,--: , fa) € N&* counts the 
number f, of k-dimensional sets x in G. If n = B(nj,i) + B(nin,i— 1) +---+ B(nj,7) is the 
Binomial development of 7n at level i, define nN = B(n;,i +1) +-+-+ B(nj,j +1). The 
theorem of Kruskal-Katona characterizes the possible f-vectors which simplicial complexes 
can have: 


Theorem: f is the f-vector of a complex if and only if f; < ee 


The theorem was found by Joseph Kruskal (1963) (a brother of Martin Kruskal known in the 
context of solitons) and Gyula Katona (1968). See [229]. Because the result is sharp, it is 
often mentioned in the context of extremal set theory. The result implies the Erdoes-Ko- 
Rado theorem [208]. The later is the result about a finite set G of sub-sets of {1,...,n} of 
cardinality k such that each pair has a non-empty intersection and n > 2k, then the number 
of sets in G is less or equal than the Binomial coefficient B(n — 1,k — 1). A bit easier to 
state is the following special case of the Kruskal-Katona theorem formulated by Lovasz: if 
fr = Bi(m,i), then f,_, > B(m,i—r) for any r > 0. The fact that these statements are 
sharp can be seen when looking at the complete complex G consisting of all non-empty subsets 
of {1,2,...,n}, where f,(G) = B(n,k — 1) which means m = n,i = k — 1 in the above 
notation. More specifically, if G = {{1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}, 
{1, 2, 3}, {1, 2,4}, {1, 3,4}, {2,3,4}, {1,2,3,4}}, where the f-vector is (4,6,4,1) we have the 
situation of Lovasz. 


86 


OLIVER KNILL 


182. COMPUTATIONAL COMPLEXITY 


An NP decision problem has a probabilistically checkable proof (PCP) if given any 
probability p < 1, there exists a polynomial f such that every mathematical proof of length n 
can be rewritten with a proof of length f(nm) and that can be formally verified with accuracy 
p. The later means that one can formally verify p * f(n) letters of the proof of an NP decision 
problem. Examples of NP hard decision problems are the traveling salesperson problem, 
the knapsack decision problem, clique problems in graphs. The PCP theorem is: 


Theorem: Every NP decision problem has probabilistically checkable proof. 


To cite [173]: "Every language in NP has a witness format that can be checked probabilistically 
by reading only a constant number of bits from the proof. The celebrated equivalence of this 
theorem and inapproximability of certain optimization problems, due to Feige et al. 1996, has 
placed the PCP theorem at the heart of the area of inapproximability. " 

The theorem has been proven by various mathematicians starting with 1990 by Laszlo Babai, 
Lance Fortnow and Carsten Lund. More work was done by Sanjeev Arora and Shmuel Safra 
from 1998. The theorem is considered one of the most important results in complexity theory 
as it shows that certain problems can have no polynomial-time approximation schemes. See 


683}. 


183. FENCHEL DUALITY THEOREM 


In the theory of convex analysis, one can look at convex bounded continuous functions f : 
X — R,g: X — Rona Banach space X and at a bounded linear map A: X > Y from X to an 
other Banach space to compute p* = infrex f(x) +g(Az) and d* = supy,cy« — fl A*y*)+9(—-y"). 
If X*, Y* are the dual Banach spaces of X,Y and A*: Y* — X* is the adjoint map (z*, Ay) = 
(A*z*, y) for the pairing of Y with Y*, then the strong duality theorem of Fenchel states: 


Theorem: p* = d* 


The theorem is due to Werner Fenchel [71]. It can be generalized, allowing for milder regularity 
and even unbounded functions f,g but then only the weak duality result p* > d* holds. 


184. LEGENDRE TRANSFORM DUALITY 


The Legendre transform of a convex function f : X — R defined on a convex set X in 
R” with inner product (x,y) is defined as the function f*(x*) = supx € X(a*,x) — f(x) on 
X* = {a*, supx(a*,x) — f(x) < co}. The convex function f* on X™ is also called the convex 
conjugate of f. One has the following duality result: 


Theorem: f** = f. 


In the simplest one-dimensional case, convexity means f”(x) > 0. The derivative ((a*, x) — 
fie) =0 means & = f(a) 80 that g(a") = f(a)“ (a").. For f@) =e, one has j*(2") 
z* log(z*) — x and for f(x) = 2? one has f*(x*) = 27/4. For the function f(x) = e* 1! = y one 
has y = f’(z) = e** and 2 = 1 + log(y) so that f*(a*) = 2* — x2* = c* — (1 + log(a*))z* = 
—x* log(x*) which is the function appearing when defining entropy. See [565]. 
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185. GERSHGORIN CIRCLE THEOREM 


If A be acomplex n x n matrix, denote by \;(A) the eigenvalues of A. These are the solutions 
to the polynomial equation p(X) = det(A— AI) = 0 of degree n. By the fundamental theorem 
of algebra, there are exactly n eigenvalues, counted with multiplicity. If R; = >> iti |A;;| is the 
/ norm of the 7’th row vector with the diagonal entry |Aj;;| missing, the disk G;; = Br,(Aj,;) is 
called a Gershgorin disc. The Gershgorin circle theorem is a result in matrix theory. 


Theorem: Every eigenvalue 4; lies in at least one Gershgorin disk 


The theorem can also be seen as a perturbation result because if A is a permutation matrix 
multiplied with a diagonal matrix, then the Gershogorin discs have radius 0. The result can be 
used to estimate how much the eigenvalues can deviate if such a matrix is perturbed. The result 
can also be used to estimate the determinant det(A) = |], Aj of A. A special case, attributed 
by Gershgorin to Bendixson and Hirsch is that |A;| < nmaxj<;,j;<n|Ai;|. The result can also 
be used to estimate the error when computing solutions Ax = 6 of linear equations. This is 
useful in numerical methods like when expressing the error of x in terms of the error in A, B 
using the condition number ||A~'||||A|| of A. Gershgorin also mentions the corollary that if 
|Aii] > 324; |Aij| for all 7, then the matrix A is invertible. The result was found by Semyon 
Aranovich Gershgorin in 1931. See [667]. This book contains also a copy of Gershgorin’s paper 
from 1931. (In that original article, Gershgorin writes his name as Gerschgorin.) 


186. THE CANADA DAy THEOREM 


For any symmetric n x n matrix A , the sum of all k x k minors det(A;x7) with |J| = |J| = k of 
A is equal to the sum of the principal k x k minors det((Z’A) x7) of the matrix TA , where 
T is the lower triangular n x n matrix that is T,, = 1 in the diagonal, and 7), = 2 for k > 1 
and T;, = 0 for k < 1. The notation is that if J, J are subsets of {1,...,n} with cardinality 
k, then P = I x J is the product set which defines the k x k matrix A;, in which only the 
elements in the pattern P appear. The minor is then defined as the determinant det(A;,.7) of 
that sub-matrix. 


Theorem: sick det(Arx.7) = ek det((T'A)7x7). 


For example, if A = i : | and T = ; ' I then for k = 2, this means det(A) = det(T'A). 
a b 


2a+b 2b+c 
a+b+b+c=a+(2b+c). The paper appeared first in and was published in [322]. Since 
the peak of the discovery appeared on a July 1 2008 which is Canada Day, the name stuck. 
The proof of the result uses the Cauchy-Binet theorem which reduces it to show 


> det(Az,.7) = iy det (T7..7)det(Ay,7) 7 


[T|,|J|=k [I|,|J|=k 


For k = 1, the theorem can be verified by computing TA = | and checking 


Now, det(T7.7) is 2?) if J < I and 0 otherwise, where p(J,I) = |J\ IN J|. 
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187. NASH EMBEDDING THEOREM 


A Riemannian m-dimensional manifold (/,g) is isometrically embedded in R" if there is 
an injective smooth map ¢@: M — R” that is an isometry. This means that g(u,v) = 
(dé(u),d@(v)) for all u,v € TM for the Riemannian metric g of M. Let us say the (m,n)- 
embedding problem can be solved for M or an (m,n)-embedding is possible, if an 
isometric smooth embedding into R” can be achieved for every compact Riemannian manifold 
(M,g) of dimension m. The Nash embedding theorem is 


Theorem: An (m,n)-embedding is possible for n > 1.5m? + 5.5m. 


For non-compact manifolds (IV, d), an isometric embedding needs the dimension n of the Eu- 
clidean space to be a bit larger. It is possible if n > 1.5m? + 7m? +5.5m. These constants 
appeared in the original 1955 paper of Nash (reprinted in [426] Chapter 11). The embedding 
cannot not work for n < 0.5m? + 0.5m because the right hand side is the number of freedoms 
of the Riemannian tensor at a point. Nash’s paper includes also some history: Ludwig Schlafli 
in 1871 conjectured an embedding in n > 0.5m? + 0.5m but Hilbert in 1901 showed that a 
constant negative curvature manifold can not be embedded in R*. Chern and Kuiper in 1952 
showed that the flat torus (T”,d) can not be embedded in R?"~!. This is sharp because for 
even n, the Clifford torus is an embedding in R?” using that T! has an isometric embedding 
in R?. For local embeddings, Elie Cartan was able to verify in 1927 (following work of M. Janet 
in 1926) that the Schlafli constant works. A modern proof of the Nash-embedding theorem 
uses the Nash-Moser inverse function theorem (combining the method of Nash from 1955 
and from a paper of J. Moser of 1966, who fashioned it into an abstract theorem in functional 
analysis [423]). The Nash embedding theorem is much harder than the Whitney embed- 
ding theorem which solves the embedding problem without insisting that ¢@ is an isometry. 
In that case, n > 2m is possible. For a more recent simplification of the proof improving also 
the constant to n > max(0.5m? + 2.5m, 0.5m? + 1.5m +5), see [280]. The local embedding is 
first solved based on the Cauchy-Kowalevski theorem for partial differential equations in 
an analytic setting. The problem is considerably harder in the smooth case and this is where 
already an iterative smoothing process is needed. 


188. ERDOS STRAUS RELATION 


The Diophantine equation 4/n = 1/x + 1/y + 1/z for unknown positive integers x, y, z,n 
is called the Erdés Strauss relation. It is equivalent to 4ryz = n(ay + xz + yz). One only 
needs to study this in the case when n is prime because if 4/p = 1/a+1/b+1/c is solved, then 
A/(pq) = 1/(aq) + 1/(bq) + 1/(eq). As the equation can be solved modulo any prime, by the 
Hasse principle one should be able to get solutions for any n; but this is still unknown. It 
can appear silly to put the following as a “theorem" because it is “obvious" (or “trivial" to use 
a curse word), once one sees it, but it illustrates that the difficulty of a Diophantine problem 
can be hard to judge, if one sees it for the first time. 


Theorem: If n-+1 is divisible by 3, then the Erd6és Straus equation is solvable. 


There is an easy explicit solution formula which one can look up, but which can be fun to search 
for, but only if one has not seen it yet. The Erd6s-Straus conjecture or 4/n problem states 
that for all integers n larger than 1, the rational number 4/n can be expressed as the sum 
of three positive unit fractions. Paul Erdés and Ernst G. Straus formulated the conjecture in 
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1948. The problem is still open. Related is a conjecture of Sierpinski, the conjecture that 
5/n = 1/a +1/y+1/z can be solved. [283]. These problems have appeal because they tap 
into an old theme of Egyptian fractions which already appear on the Rhynd papyrus from 
around 1650 BC. On that document, numbers 2/n were written as Egyptian fractions for all 
odd numbers n between 5 and 101. An other interesting problem is to count or estimate the 
number f(n) of solutions of the 4/n problem. In 205], the sum S(n) = do <np prime J (P) is 


bound both from below and above by nlog?(n) < $(n) < nlog?(n) log log(n). 


189. DIEUDONNE DETERMINANT 


If A isan Xn matrix with entries in a not necessarily commutative ring like the quaternions 
Q, one can still look at the Leibniz determinant det(A) = 5°, sign(a) Aio(1)) +++ Ano(n)- This 
is a sum over all permutations o of {1,...,n}, where sign(c) is the signature of o. It does not 
satisfy the Cauchy-Binet identity det(AB) = det(A)det(B) in general. There are two ways 
to get a determinant which satisfies the later: the first one is called the Study determinant 
[642]. It is a real-valued determinant defined if R is a real normed division ring, meaning 
|ab| = |a||b|. The second is the Dieudonné determinant [169] which takes values in the 
Abelianization R/|R, R] of the division ring (this is the unique largest subring of R that is 
Abelian. It is obtained by factoring out all elements of the commutator form aba~'b~'). The 
Dieudonné determinant has the property that it agrees with the Leibniz determinant in the com- 
mutative case like R = R or R=C, the Study determinant is a bit easier to compute because 
we do not bother with commutators and allows directly go to the norm. Both determinants rely 
on the ability to make row reduction which requires that one can divide from the left or from 
the right. They work especially in all normed real division algebras R,C,H,O, where in the 
quaternion and octonion case, the Study and the Dieudonné determinant agree. The axiomatic 
definition of the Dieudonné determinant is by asking it take values in the Abelianization R and 
demanding for example det(A)det(B) = det(AB) and det(A) = J], Ai: if A is upper triangular. 


Theorem: For a division ring, there is a unique Dieudonné determinant. 


It follows from the axioms that det(A) = 1 and that det(1 + £;;) = 1, if Ej; is the elementary 
0 — 1 matrix which is 0 everywhere except in the diagonal and the entry 77, where it the value 
is 1. It also follows that det(AA) = Adet(A) so that row reduction allows to compute the 
determinant depending on whether (—1) = 1 or not. It also follows that det(A) = 0 if and only 
if A is singular because that is equivalent to having A row reduce to a triangular matrix with 
a zero in the diagonal. For quaternions for example (—1) = 1 because iji7!j7! = kk = —1. 
Because SU(2) has a trivial Abelianization, one has ¢ = |q| for quaternions. In order to show 
the existence of the determinant, one can use row reduction and note that for n > 2, any 
diagonal entry aba~'b~! can be morphed into 1 using row reduction steps. One can verify the 
product property by writing the matrix A as a product of elementary matrices and abelianized 


ring elements. The Dieudonné determinant is treated in O08}. 


190. CENTROID THEOREM 


The surface area of a surface 9 or revolution in R? obtained by rotating a piecewise smooth 
curve 7’ around the axis of symmetry L is equal to the arc length |7| times the length |C| 
of the circle which is traced by the geometric centroid of J’. This is the Pappus surface 
centroid theorem and it can be written as |.S| = |7||C]|. Similarly, the volume |F| of a solid 
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of revolution EF’ obtained by taking the unions of all projection lines from S to L is equal to the 
area |A| of the flat lamina A between L and T, multiplied by the arc length |C| of the circle 
which the centroid of A traces when rotated around L. The Pappus solid centroid theorem 
is then the formula |E| = |A||C|. This can be generalized: let C' be a finite curve connecting 
two points P and Q and let A be a bounded closed region with smooth boundary T’ contained 
in the plane perpendicular to the curve at A such that P is the centroid of A. The region A 
can be transported along C' using the Frénet frame and defines a solid E with boundary S. 
We assume that the tube S remains smooth and is a smooth embedding of a 2-dimensional 
cylinder in R°. 


Theorem: For surface area |S| = |T'||C]|, for volume |E| = |A||C}. 


For example, if T is a half circle of radius r in the xz-plane connecting P = (0,0,—r) with 
Q = (0,0,r) and L is the z-axes, then |y| = mr and |C| = 27(2r7) = 4r so that the surface 
area of the sphere S is 4rr?. A lamina A is a half disc in the xz-plane of radius r which has 
area |A| = mr?/2. The centroid of A has distance d = 4r/(37) from L moving on a circle 
of length |C| = 2rd = 8r/3. The volume of the sphere of radius r therefore is |A||C] = 
(xr?/2)(8r/3) = 4rr3/3. The result of Pappus is also used to compute the surface area and 
volume of tubes. Here is an other example: if C is a smooth closed curve in R? such that 
the tube U,,-¢ B,(z) forms a solid E with piecewise smooth boundary surface S that does 
not have any self intersection, then the surface area is |S| = |T||C| = 27r|C| + 4ar? and the 
volume is |E| = |A||C| + 4rr? (the additional terms come from the sphere “roundings at the 
end points"). In this case, the lamina A are disks of radius wr? and the curves T are circles 
of arc-length 27r. Even more general versions have been discussed in detail in [258]. For tube 
methods in differential geometry also in higher dimensions (which are certainly also inspired 
by the Pappus centroid theorem), see [267]. Herman Weyl used tubes as a powerful tool in 
differential geometry [689]. 


191. THE BORSUK ANTIPODAL THEOREM 


Let M = S" denote the n-sphere {|z|? = 1} C R"*! equipped with metric induced from open 
sets in the Euclidean space R"t?. Let Ag,..., An be cover of M by closed sets. This means 
that U;_,) Az = S. We say, a subset A of M contains an antipodal pair, if there is a pair of 
points {x, —«} € M which are both in A. 


Theorem: A cover of the n-sphere by n+ 1 sets contains an antipodal pair. 


The theorem is also known as the Lusternik-Schnirelman-Borsuk antipodal theorem 
(already called so by [827]), much of the literature just calls it the Borsuk theorem, maybe 
because of simplicity. The theorem is equivalent to the Borsuk-Ulam theorem stating that 
every map f from M to R” has the property that some antipod pair x,y has the property 
that f(x) = f(y). Stan Ulam was credited in the Borsuk paper as the originator of the 
problem. In [69] section 41 contains elegant proofs that the statements in Borsuk’s theorem 
and in the Borsuk-Ulam are equivalent. The result generalizes to the situation when M with 
a manifold homeomorphic to S” equipped with an involution T': M — M which is conjugated 
to the antipodality on S”. See Section 150. For n = 1, the theorem is equivalent to the 
intermediate value theorem: M is a circle and the function f(x) — f(a’), if not constant 
0, takes both positive and negative values so that there must be a point where f(x) = f(z’) 
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with antipodal points x’. For n = 2, if we cover the 2-sphere with 3 open sets, there is one of 
the sets which contains an antipode. The more surprising equivalent Borsuk-Ulam statement is 
then that there are two anti-podes on earth, where both the temperature and the pressure are 
the same. The theorem appeared first in 1930 in a paper by Lusternik and Schnirelman and 
then more generally in 1933 by Karol Borsuk [74]. The fact that there is a general theorem on 
Lusternik-Schnirelman category by Lusternik and Schnirelman is a reason to stick to Borsuk for 
the antipodal theorem. Heinz Hopf generalized in 1944 the theorem as follows: if Ao,--- , An 
are n closed sets covering the unit sphere S” in R"*! and 0 < d < 2 is a distance, then there 
exists a set A, in which there exists two points of distance d. The special case d = 2 is the 
Borsuk theorem. Hopf notes that this implies that if the n-sphere is covered by n+2 non-empty 
closed sets such that none of them contains a antipodal pair, then every collection of n + 1 
sets has a non-empty intersection and states in a footnote that this means that the nerve of 
the cover Fo,...,F;,4, is then isomorphic to the boundary complex of a (n + 1)-dimensional 
simplex. 


192. ZAGIER’S INEQUALITY 


Assume f,g are non-negative and decreasing functions on [0,7]. They are then automatically 
integrable. Denote by E|f] = a f(x) dx the average of f. 


Theorem: If f,g are non-negative and decreasing, then E[fg] > E[f]E[g}. 


formulates this more generally as follows: if f,g are decreasing and non-negative on [0, co) 
and F,G € L'((0,00)) take values in [0,1], then (f,g) > (f, F)(g, G)/max(I(f), [(g)), where 
(Pf) = is F(a)dx = |F\;. 

The Zagier inequality has also been called a anti-Cauchy-Schwarz inequality because in 
Cauchy-Schwarz |f -g| < |f||g| in a Hilbert space, the inequality works in the opposite 
direction. In [69], the inequality on finite intervals is called Chebychev’s inequality but the 
later should maybe be reserved for the inequality P[|X — E[X]| > «] < Var[X]/e? for a random 
variable variable X € L?(Q,A, P) on a probability space (0,A,P). The Zagier inequality also 
works for decreasing sequences f,,, where Ef] = ae fi; is the Birkhoff average. Now, 
the same statement E| fg] > E|f|E[g] holds. In the simplest case, for f = (a,b) and g = (c,d), 
this is equivalent to 2(ac + bd) > (a+ b)(c+ d) for a > b,c > d which is already not totally 
obvious as it is equivalent to ac + bd > ad + be. 


193. GINI COEFFICIENT 


If 71,---,2%p, are non-negative real numbers with mean m = te Lp, the number G = 
Sim Loia1 oj -1 [ti — &,| is called the Gini coefficient of the data. Using |a — b| = a +b— 
2min(a, b) it can be rewritten as G = 1 — — S| 07, min(a;, 2;). A common interpretation 
is that x, is the income of person k in a population X = {1,...,n} of n people. The number m 
is then the mean income of the population. If a population X of n people is split into smaller 
groups X;,k =1,...,r of size n, and have mean income mz, then ey N= Nn, yi NpMp_ = 
nm. If G(X) is the Gini coefficient of X and G(X;) the Gini coefficient of the sub-population 
X;,, then 


Theorem: nG(X) > o)_, neG(Xz) 
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This and many more inequalities relating G(X) with G(X;) appear in [710]. There is also 
a continuum analog: for a probability density function f on [0,00) with i ie) ae = 
l,m = f>° af(a) dx, the continuum Gini coefficient is defined as G = = [5° .° |x — 
y|f(x)f(y)dxdy which is equivalent to G = 1 — + fo i min(x,y) f(x) f(y)dady. The Gini 
coefficient is a the area between the Lorenz curve and the yan G=2 th : p—L(p) dp, 
where p = js ai ) dt is the cumulative distribution value and L(p) = + 7a tf(t) dt. In the 
context of income inequality, where the subject has come up in economics, L(p) represents 
the fraction of the total income which is earned by the poorest np people. The graph of L(p) 
is a convex curve from (0,0) to (1,1), the slope L’(p) being the relative income in the corre- 
sponding percentile of the population. The Gini coefficient is also called Gini index. It has 
been introduced by Corrado Gini in 1912. It is a natural quantity because on the real line 
the Green’s function of the Laplacian —A/2 with Af = f” one has g(x,y) = |x — y|. The 
potential V(x) = |z| is the natural “Newton potential". For M = R?@ in dimension d # 2 it 
is g(x,y) = |2|?~¢ for the Laplacian —A/|Sa_1|, where |.$;| is the volume of the k-dimensional 
unit sphere; it is the logarithmic potential log |z|/(27) in dimension d = 2. The most famil- 
iar case is the 3-dimensional Euclidean space R*, where the Newton potential 1/|x| appears 
in electro magnetism and gravity. The Gini potential |x| is roughly the force between 
two planar parallel mass sheets like two galaxies rotating around the same axis. In general, 
for any Riemannian manifold M with Greens ne g(x, y) a inverse of the Laplacian) 
and measure js (mass distribution) the integral I(u) = J, fy, 9(%,y)du(x)du(y) is the po- 
tential theoretical energy of the measure w. oe Gini index therefore is proportional to 
the potential theoretical energy for a mass distribution with density yp = f(x)dx on [0, 0c). 
The above inequality could therefore be interpreted as an inequality for the potential energy 
of particles which are partitioned into non-interacting groups. Switching off energies between 
non-interacting parts lowers the energy. 


194. DENJOY-KOKSMA THEOREM 


If T : X + X be an ergodic automorphism of a probability space (Q,.A,). (Automorphism 
means j(T'(A)) = (A) for all A € A and ergodic means that T(A) = A implies (A) € {0, 1}.) 
The Birkhoff a rer assures that for all g € L1(Q) and almost every x € 1) we 
have S,(x)/n > Elg] = f, 9(2) ) with the Birkhoff sum 5, = )77_5 \g(T*x). An example 
dynamical system is i Lae ae T :x—2x+a on the circle T' = R'/Z! equipped 
with the Lebesgue measure js = dx. Denjoy-Koksma theory estimates the growth of S,,(x) 
depending on Diophantine properties of a and regularity properties of g. A real number 
a is called Diophantine, if there exists a constant C' such that |pa — q| < Cq, for all integers 
p,q. A function g has bounded variation if Var(g) = supp >> |g(xi+1) — g(a;)| is finite, where 
the supremum is over all finite sets P = {x,,...,2%) = Yo} in T'. In the simplest case, the 
Denjoy theorem says S,, < Clog(n)Var(g) for all n and that there is a sequence of integers 
Qn, for which S,, (x) < Var(f), the periodic approximations p,/q, — a. For r > 1, a real 
number a is called r-Diophantine, if |ga — p| < Cq" for all integers p, g. The Denjoy-Koksma 
theorem was generalized in 1999 by Svetlana Jitomirskaja to 


Theorem: If a is r-Diophantine, then |9,,| < Cn!~/" log(n)Var(g). 


For a periodic approximation p/q of a one has |S,| < Var(f): to see this divide T! 
into q intervals centered at y, = kp/q. The intervals have length 1/q¢ + O(1/q’) and each 
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contains exactly one point. Renumber the points to have y,; in [,. By the intermediate value 


theorem, there exists a Riemann sum ; = f(a) = f f(z) dx = 0 for which every 2; is in 


an interval J;. Choosing x; = minz¢;, f(x) gives an lower and x; = max, ¢,, f(x) gives an upper 
bound. Now, 3°? f(us) — Fs) < SF Gs) — Fes) + LF as) — Flupaa)| < Var(f). Therefore, 
if de << qey1 and n = bude + On-1%-1 +++: + b1q + bo, then Si, < 77 (bo +: +++ bn) Var(f). 
where by < qiti/qi- SO, Sn < Ty5 7 Var(f). If a is r-Diophantine, then |ga| < c/q” and 
dist < qi /c. Because n < desi < gi/c, we have q, > (en)!/" and n/q, < cV"k-/", Because 
k < 2log(q.)/log(2), the claim follows. For r = 1, see [148] (page 84). In general, see [854]. 


195. QUADRILATERAL THEOREM 


Let ABCD denote a convex quadrilateral in R?. Alternatively, the four arbitrary points 
A, B,C, D in R® define a tetrahedron. Assume the side lengths are a = |AB|,b = |BC|,c = 
|CD|,d = |DA| and that the diagonal lengths are e = |AC|, f = |BD|. Let M = (A+C)/2 
and N = (B+ D)/2 be the midpoints of the diagonals and g = 2|MN|. The Euler law on 
quadrilaterals is 


Theorem: a? + 6? 4+ c+ d* =e? + f? +4’. 


One can verify this by just expanding out what one gets when writing the condition in coordi- 
nates. The proof then shows that the statement gives also a statement about lengths of a tetra- 
hedron in space R?: if a,b,c, d,e, f are the side lengths of an arbitrary tetrahedron in space 
and the edges L, M belonging e, f have no common point, and g is twice the length between the 
midpoints of the two segments L and M, then the same relation holds in space. This has been 
noted in [364]. In the case of a rectangle, where a = c,b = d,g = 0,e? = f? =a’? +b?,g? =0 
one has the Pythagorean theorem. In the case of a parallelogram, where a = c,b = d,g = 0 
one has 2a? + 2b? = e? + f?, it is the parallelogram law. Some other themes of Euler come to 
mind too like Diophantine equations: if the points A, B,C, D have integer coordinates and all 
distances between points are integers, one has a problem in number theory. For rectangles, this 
leads to Pythagorean triples. The problem of perfect Euler bricks comes then to mind, 
which asks for a cuboid with integer side and diagonal lengths. 


196. REEB SPHERE THEOREM 


Let M be a closed, compact d-dimensional differentiable manifold. Closed means that the 
boundary of M is empty. If f : M — R is a smooth real-valued function, then points x © M 
with vanishing gradient V/ f(x) = 0 are called critical points. A critical point x is called 
non-degenerate, if the Hessian d x d matrix H(f)(x) is invertible at x. Let c(M) denote 
the minimal number of non-degenerate critical points which a function f on M can have. We 
say M is a d-sphere, if there is a homeomorphism of M to the standard unit sphere {|x| = 1} 
in Ree, 


Theorem: c(V/) = 2 if and only if M is a d-sphere for some d > 0. 


The level curves f~'(c) = {f = c} of f form then a foliation of M which are (d—1)-dimensional 
spheres which only degenerate to points at the critical points. The proof of the theorem goes 
by showing that a manifold which admits exactly 2 critical points can be covered by 2 balls, 
then use that this characterizes spheres. The Reeb sphere theorem was proven in 1952 [556). 
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It is referred to and generalized in [476] who generalizes and improves on results by Milnor and 
Rosen. The assumption of f has two critical points does not imply that M is diffeomorphic to 
the standard unit sphere. There are exotic spheres which are homeomorphic to the standard 
unit sphere but not diffeomorphic to it. The Reeb theorem is covered in [490]. In the first proof 
of the existence of exotic 7-spheres, [492], the Reeb Sphere theorem was used as hypothesis H. 


197. HAUSDORFF DISTANCE 


Let (X,d) be a metric space. Given a compact subset U of X, let B,(U) the set of all points 
that are in distance < r from a point of U. In other words B,(U) = U,<y Br(a), where B,(x) 
is the ball {y € X,d(x,y) < r} in X. The Hausdorff distance 6 between two non-empty 
compact subsets U,V of X is defined as the infimum over all r > 0 such that U C B,(V) and 
V Cc B,(U). It is a metric on the set of all compact subsets. This space (x, 6) is a new metric 
space. It is again compact: 


Theorem: If (X,d) is compact, then (x, 6) is again compact. 


The process could therefore be iterated and produce a sequence of compact metric spaces, 
where in each step the Hausdorff metric is used on the previous one. For Hausdorff distance, 
see [214], Chapter 9, in the context of iterated function systems in the theory of fractals. 
A sequence of contractions defines an attractor which can be seen as a limit of a sequence of 
compact sets. In the simplest situations, one can then use the Banach fixed point theorem 
to establish the existence of a limit. The distance has been used by Maurice Fréchet in 1906 to 
measure the distance between curves. The distance was introduced by Felix Hausdorff in 1914 
303] (page 303). 

The Hausdorff distance allows also to define a distance between compact metric spaces (X1, d1), 
(Xo,d2). The Gromov-Hausdorff distance of two compact metric spaces is defined as 
the infimum over all possible Hausdorff distances 6(¢1(X1), 62(X2)), where ¢; : X; ~ X are 
isometric embeddings of (X;,d;) into a third metric space (X,d). This metric space (4, D) of 
all compact metric spaces has a dense set of finite metric spaces so that it is separable. It is 
also complete, from which one can deduce that it is connected. David Edwards called this 
“superspace". 


198. GROVE-SEARLE THEOREM 


The set of compact even-dimensional Riemannian igs which admit a positive cur- 
vature metric contains spheres S24, projective spaces RP?¢, CP, HP’, OP” over the division 
algebras R, C,H, O, the three Wallach flag manifolds W’*, yw we [677] and the Eschen- 
burg manifold E® [210]. No other example is known [719]. All these manifolds admit a 
positive metric with a continuum isometry group. In particular they admit a metric which 
allows for an isometric circle action. The fixed point set N = (MM) of such an action is never 
empty [52]. By a theory started by Conner and Kobayashi it is again a positive curvature 
manifold N that is totally geodesic and of even co-dimension. The components of N can have 
different dimension but by Lefschetz, the Euler characteristic of N is the Euler characteristic 
of M [410]. Lets call a manifold with circular symmetry Grove-Searle if the fixed point 
set N has a connected component of co-dimension 2. The Grove-Searle theorem now 
tells: 
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Theorem: If M is Grove-Searle, then M = S74, RP”? or CP“. 


In odd dimensions, there is beside M = S*“! or M = RP***! also the possibility of space 
forms S?¢+!/Z,,,. An application of the theorem is that all 2d-dimensional positive curvature 
manifolds admitting a circular symmetry have positive Euler characteristic if 2d < 8. Proof: N 
is not empty by Berger and y(V) = y(M). N has a co-dimension 2 component, Grove-Searle 
forces M to be in {RP?4, S24, C Pe By Frankel [228], there can be not two co-dimension 2 
connected components. In the remaining cases, Gauss-Bonnet-Chern forces all to have 
positive Euler characteristic. There is huge interest in even-dimensional positive curvature 
manifolds because of the open Hopf conjecture asking whether every 
even-dimensional compact positive curvature manifold has positive Euler characteristic. The 
above corollary of Grove-Searle assures that the Hopf conjecture with circle symmetry holds 
in dimension < 8. It is also known for 2d = 10: [552\ [567] [696]. See also 2d = 6 in [538] (2. 
Edition, Cor. 8.3.3). While in dimension 2 and 4 the classification of positive metric manifolds 
with circular symmetry is known like {S‘, RP*, CP?} in dimension 4 [332], in dimension 6 one 
knows so far the cases {S°, R P? CP’, E°,W®} and it is not known whether they are all. 


199. RADON-NIKODYM THEOREM 


A measurable space ((2, A) is a set equipped with a c-algebra A. This means that A is a set 
of subsets of X containing X, that is closed under forming complements and the operation of 
taking countable unions. A non-negative valued function f : Q — |0,00) is called measurable 
if f-'(B) € A for every B in the Borel o-algebra on (0, 00), the smallest o-algebra containing 
the open sets. Given two o-finite measures ju, 1, (meaning that 2 is in each case a countable 
union of sets of finite measure), on (2,.A) one calls 4, absolutely continuous with respect to 
vy, if v(A) = 0 implies (A) = 0. An example is if there exists a function f € L'(w,A,v) such 
that (A) = J, f(x) dv(x), then pz is absolutely continuous with respect to v and the function 
f is called the Radon-Nikodym derivative of ju with respect to v, as du(x) = f(x)dv(x) 
suggests to write dj/dv = f. The Theorem of Radon-Nikodym assures that this situation 
is the general case. Let us abbreviate w << v if yz is absolutely continuous with respect to v. 


Theorem: If pu << v, there exists f € L1(Q,A,v) with pw = fv. 


The theorem is important in probability theory, where the measures under consideration are 
usually probability measures, meaning p(Q) = 1. If yu is absolutely continuous to v then 
every set of zero probability with respect to v has zero probability with respect to yu. An example 
of a measure js on the Lebesgue space ((0,1],.A,v = dx is a Dirac point measure 6, for a point 
in [0,1]. An application of the Radon-Nikodym theorem is the Lebesgue decomposition 
of a measure. One can split every o-finite measure into an absolutely continuous, a singular 
continuous and a pure point part. This is important in spectral theory of mathematical physics 
[557]. For measure theory and real analysis in general, see for example [207]. For the history, 
|607] (page 257): the theorem was first proven by Radon in 1913 in R” and then by Nikodym 
in 1930. 


200. CROFTON FORMULA 


If a needle of length / < 1 is thrown randomly into a periodic grid of lines spaced distance 1 
apart, the probability of hitting a grid line is 21/7. This method of computing 7 is an example 
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of a Monte-Carlo method. A probability space of needle configurations can be given as 
(Q, A, w) = ((0, 1/2] x [—2/2, 7/2], A, 2d0dr/7) with product Lebesgue measure, where r is the 
minimal distance of the center of the needle to a grid line and @ is the polar angle. The needle 
obviously hits if and only if r < (1/2) cos(@). The probability therefore is obtained by integrating 
the density 2/7 over this region. It gives ic ( a) 2/mdrd0 = 2l/7. This can now be 
generalized for any rectifiable curve of length /. One has only to look at the random variable 
X, which counts the number X of intersections of the randomly placed curve with a grid. The 
Crofton formula in the plane is now E[X] = 21/7: (to see this, approximate the curve by a 
polygon and look at each segment J; as a “needle" of length I/n. Then X = X,+---+X, where 
X, counts the number of intersections with L;. By linearity of expectation and additivity of 
length, the Crofton formula follows.) One can look at the problem also in R", where one has 
a system of parallel hyperplanes spaced a unit apart and a rectifiable curve of length /. Now, 
the volume |B"~'| of the (n — 1)-dimensional unit ball and the volume |S"~'| of the (n — 1)- 
dimensional sphere matters. Again, X is the number of intersections of the curve with the 
periodic plane grid. 


Theorem: E[X] = 21|B"~1|/|S"—1]. 


In the case n = 2, this was |B"| = 2,|.$'| = 27 and the original Buffon formula follows. The 
Buffon needle problem is the fist connection between probability theory and geometry. It 
appeared first in 1733 and was reproduced again in 1777 by Buffon. Morgan Crofton extended 
this in 1868 [153]. The mathematical field of integral geometry started to blossom with Blaschke 
[67] in the late 1930ies. Probability spaces can be used to study more geometrical quantities 
like surface area, or curvature [40} [489]. General references are [590]. The 


n-dimensional Crowfton formula can be found in [385). 


201. DESNANOT-JACOBI IDENTITY 


If A is an Xn matrix, the matrix entries are accessed as A,;. Call A! the matrix obtained by 
deleting row i and column j in A. The expression (—1)‘*/det(A?) is also known as a cofactor 
of the minor det (A? ). Similarly, let AN be the matrix in which rows 7,7 and columns k,/ are 
deleted. The Desnanot-Jacobi identity is the following relation between sub-determinants 
of a matrix: 


Theorem: det(A)det(Aj") = det(Aj)det(A”) — det(A?)det(A}). 


It allows to write det(A) in terms of the (n — 2) x (n — 2) matrix in which the boundary rim 
is removed and all the four possible (n — 1) x (n — 1) matrices, where one boundary row and 
boundary column is removed from the matrix. In the case when n = 2 the identity still works if 
one interprets det(A}”) = det(Aj3) as 1, which is usually assumed the value for the determinant 
: : p = ad—be. 
The identity is also called the Desnanot-Jacobi adjoint matrix theorem. A generalization 
is called the Sylvester determinant identity. The Desnanot-Jacobi identity leads to a 
process called Dodgons condensation or Alice in Wonderland condensation because 
Charles Lutwidge Dodgson is also known as Lewis Carroll, the author of “Alice in Wonderland" 
ji11]. The condensation method was described in in 1866. The Desnanot result appears 
first in 1819, in the book on page 152 and in in 1827. Like for Cauchy-Binet, it is 


of the empty matrix. In that case, the Desnanot-Jacobi identity is just det( 
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historically remarkable that this identity was found before matrices were formed. Indeed, the 
word “matrix", related to the latin word “mater" for mother was later used in a more generalized 
sense as “womb". The word “matrix" therefore appeared because matrices are devices which 
bear determinants. There are more references in [408]. 


202. EXISTENCE OF MINIMAL SURFACES 


A 2-dimensional surface S in R” is the image of a parametrization r(u,v) : R > R", where 
R C R? is the parameter domain, an open, simply connected region in the plane which one 
can assume to be the unit disc R with circle C' as boundary. The surface is called minimal 
surface if very component of r is harmonic Ar = 0 and furthermore EF — F = |r,,|? — |r, |? = 0 
and F = r,-r, = 0, expressing that the Riemannian metric g on S is conformal. The Plateau 
problem is to find for a given simple closed curve [ in R”, a minimal surface S which has [ 
as the boundary. One wants the map r to be smooth in R and continuous up to the boundary 
C. The surface S does not necessarily have to be embedded (r is not necessarily injective), it 
can just be immersed. 


Theorem: There is a solution to the Plateau problem. 


Note that this does not mean that the solution is unique. Indeed, in general there are multiple 
solutions even-so generically only finitely many. In general, solutions also can have branch 
points, self-intersections or can be physically unstable and so would be difficult to observe in 
soap bubble experiments. The problem was solved first in 1931 by Jesse Douglas and Tibor 
Rado in 1930. If more generally, the region R has larger genus and so several boundary curves, 
the problem is called the Douglas problem. When looking at how soap films change in 
dependence of parameters, huge changes like catastrophes can happen. For example, in that 
if is changed, suddenly, solutions to a genus one Douglas problem appear as it has lower 
energy.[226] In order to solve the Plateau problem one is led to the variational problem of 
extremizing the Dirichlet integral L(r) = JJ, |rul? + |ro|?dudv. The harmonicity condition 
Ar = 0 is the Euler equation of the variational problem. This is a special case of a Dirichlet 
principle. The problem was raised by Joseph-Louis Lagrange in 1760 and named after the 
physics and anatomy professor Joseph Plateau who made experiments. Poisson realized that 
soap films are surfaces of constant mean curvature. In higher dimensions, the problem has 
led to geometric measure theory. We followed partly [149]. More information is in [640], 
where also the history of soap films and soap bubbles is described as one of the oldest objects 
in mathematical analysis and pointed out that for a long time, since Lagrange’s derivation 
of the minimal surface equation, the analysis was too difficult even for mathematicians like 
Riemann, Weierstrass or Schwarz. In [226] (part I) there is more history and many pictures 
and relations where minimal films in nature as the most economical surfaces forming skeletons 
of radiolarians, tiny marine organisms. 


203. FERMAT’S RIGHT ANGLE THEOREM 


A positive integer is a congruent number if it is the area of a right triangle with rational 
sides. The 3-4-5 triangle for example has the area n = 6 so that 6 is a congruent number. 
The 3/2,20/3,41/6 triangle has area n = 5. The example n = 5 shows that one have to 
use rational numbers in general. If x,y, z are the lengths of the triangle, then the condition is 
x+y? = 27, ry = 2n. Rational Pythagorean triples can be generated with x = u?—v?,y = 
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2uv, z = u?+v?. This leads to congruent numbers n = uv(u?—v?). For u = 3,v = 2 for example, 
one gets the 12-5-13 triangle with with area 30. Fermat showed: 


Theorem: No square number can be a congruent number 


Fermat’s proof from 1670 using decent can be found in a self-contained way in [139]. While 
integer solutions (x,y,z) can be done by finite search for a fixed n, the task to find rational 
solutions x,y,z for a given n can be difficult. For example, the smallest example for n=101 
found by Bastien in 1914 is x = 711024064578955010000/q, y = 3967272806033495003922/q, 
z = 4030484925899520003922) /q with g = 118171431852779451900 [118] shows that already for 
smaller n, the smallest rational numbers x,y, z solving the problem can become complicated. 
Arabic mathematicians have known that numbers like 5,6, 14, 15, 21, 30, 34, 67, 70, 110, 154, 190 
were congruent numbers. Leonardo Pisano (Fibonacci) established that n = 7 is a congruent 
number with (2, y, z) = (35/12, 24/5, 337/60) and conjectured that no square can be a congruent 
numbers. Fermat then with his method of infinite descent proved that no square is a congruent 
number. Already n = 1 is interesting as it illustrates the decent method: ifn = 1 is congruent 
then zt = y*+2? has a non-trivial solution. Let a be a rational number such that a? +n, a?—n 
are squares of rational numbers. Then xz = Va? +n+Va?—n,y=Va4+n+Va?—n, z= 2a 
is a solution as ry/2 = (a? +n) — (a? — n))/2 = n. Work of 1922 by Louis Mordell related 
the congruent number problem to elliptic curves. If u is so that u? + n,u? — n are rational 
squares, then u* — n? is a rational square v? so that u® — n?u? = u?v?, with x = u?,y = uv 
this gives y? = x? — n?x. So, if n is a congruent number, there is a rational point on the 
curve y? = v3? — n?x. Kurt Heegner proved in his 1952 paper that if a prime is congruent 
to 5 or 7 modulo 8, then p is a congruent number and that if a prime is congruent to 3 or 
7 modulo 8 then 2p is a congruent number [62]. Jerold Tunnell (a student of Tate) showed 
in 1983 that the congruent number problem would have a full solution under the Birch 
and Swinnerton-Dyer conjecture, one of the Millenium problems. Having that established 
would allow to test in finitely many steps whether a given integer n is a congruent number or 
not. 


204. STARK-HEEGNER THEOREM 


A imaginary quadratic field K = Q|,/—n] has class number 1 if there is a unique prime 
factorization in kK. Carl Friedrich Gauss found already 9 cases {1,2,3,7, 11, 19, 43, 67, 163}. 
These cases turned out to be all and are now called Heegner numbers. 


Theorem: There are exactly 9 imaginary quadratic fields of class number 1. 


The theorem is now known as the Stark-Heegner theorem. Kurt Heegner proved this in 1952 
[308]. The proof was for more than a decade labeled to “have a gap", but it got rehabilitated by 
Harold Stark in 1969 thanks also to who was one of the first to recognize Heegner’s 
achievement in his 1952 paper. The introduction of Heegner’s paper is a master piece, skillfully 
pointing out how the class number theory has relations to the congruent number problem 
that historically has led Fermat to his descent method. As Stark pointed out, the dismissal 
of Heegners proof must also have been due to professional bias as there was just one step 
missing showing that a concrete equation «74 — az® — 16 = 0 has a six degree factor whose 
coefficients are algebraic integers of degree 1, to which he refers to Weber’s textbook [682]. 
About the origin of bias [62]: Heegner was a fine mathematician, with a rather low-grade post 
ina gymnasium in East Berlin. It was a widely held view that the trouble in Heegners proof 
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should be traced to Weber. Today thanks to Stark, it is now clear that the gap was actually 
not existent and Heegners proof correct. Stark also points out that Weber’s part is correct but 
could have given more details, if he had seen any need to do so. One can justify the name 
Stark-Heegner theorem because Stark not just gave a new clarified proof but took the trouble 
to investigate whether there was indeed mistake in Heegner’s proof. Bryan Birch certainly 
also played an important role in discovering Heegner as also “Heegner’s numbers" got into the 
spot light in the context of the Birch and Swinnerton-Dyer conjecture and the Gross- 
Zagier theorem. The largest Heegner number got a bit of a “cult status" as it appears in 
Ramanujan’s constant e”V!® that is less than 10~!2 close to the integer 6403203 + 744. This 
can be justified by the fact that ifn is a Heegner number, then the j-invariant of (1+./—n)/2) 
is an integer and a q-expansion gives then a theoretical error is of the order O(e~*Y!®3), 


205. EQUICHORDAL POINT THEOREM 


If C is a smooth convex curve in the plane a point P in its interior is called an equichordal 
point if all the line segments through P have the same length. For the circle C’, this happens 
at the center. For the polar curve r(t) = 2 + sin(t), the center is an equichordal point. 


Theorem: A convex curve can not have two equichordal points. 


The problem had been posed by Fujiwara in 1916 [235] and appeared in a problem section of 
Blaschke, Rothe and Weitzenbéck: [68]. It seems that also Erdés was independently conjectur- 
ing this as Gabriel Andrew Diracs work of 1952 indicates [I[74]. The conjecture was proven by 
Marek Rychlik in 1997 who established it more generally for star-like curves. The proof 
uses methods from dynamical systems, complex analysis and algebraic geometry. 


206. LUCAS FUNDAMENTAL THEOREM 


The Fibonacci sequence ['(n) is defined by the second order recursion F'(0) = F'(1) = 1 and 
F(n+1) = F(n) + F(n — 1). When looking at the prime factorizations one can notice that 
the even terms F'(2n) have lots of prime divisors while the odd terms F’(2n + 1) have only a 
few. Indeed, it follows from Lucas work that all primes appear as factors of the even Fibonacci 
numbers. Let GC'D denote the greatest common divisor. 


Theorem: GCD(F(m), F(n)) = F(GCD(m,n)). 


This fundamental theorem of Lucas of 1878 tells that the sequence F'(n) is a strong 
divisibility sequence. Together with Lucas law of apparition and Lucas law of repe- 
tition, it implies that every integer divides infinitely many Fibonacci numbers. [431]. In the 
context or primality testing, Lucas also looked that the Lucas numbers, L(n) which satisfy 
the same recursion but have a different initial condition L(1) = 1, L(2) = 3. One has then 
F(2n) = F(n)L(n). Lagarias proved in 1985 an anlogue of the Chebotarev Density The- 
orem using a method of Hasse. He showed that the density of prime divisors of the Lucas 
sequence is 2/3 [432]. That article mentions that it is believed that the set of primes dividing 
the terms U(n) of any non-degenerate second order linear recurrence has a positive density and 
that this is conditionally true under the assumption of the generalized Riemann hypothesis. 
A bit about the history (see [432]): the Fibonacci sequences appeared first in the third book of 
“Liber Abbaci" of Leonardo Pisano from 1227, a book that contains 90 sample problems, with 
50 from Arabic sources. It also contains the famous rabbit problem. Edouard Lucas had been 
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an artillery officer in the Franco-Prussian war and then was a high school teacher in Paris, who 
also was interested in recreational mathematics and invented the tower of Hanoi problem 452). 


207. HILBERT DISTANCE 


The Hilbert distance d(x,y) is defined for points x,y a bounded convex domain X in a 
Hilbert space: construct the line through x,y. It intersects the boundary of X in exactly 
two points p,q. The Hilbert distance is now defined as d(x,y) = 5 log(C(x, y, p, q)), where 
C(x, y, p,q) = (|x —plly—4])/(ly —p||x —q|) is the cross ratio between these four points. Due 
to its projective invariance, the Hilbert distance defines then also a Hilbert distance on the 
projective space RP”! which has the property that positive n x n matrices are contractions. 
Lets call a metric on the projective space Perron-Frobenius if it has this property. 


Theorem: The Hilbert metric is the unique Perron-Frobenius metric. 


In the simplest case P', elements are described as t = [1, t] with t € RU{oo}. The Hilbert metric 
7 ° | maps [1,4] to [1, (c+ dt)/(a+b8)). 


David Hilbert defined the Hilbert metric in 1895 in a letter to Felix Klein [315]. The Hilbert 
metric between two points depends on the domain in which the points are considered. The 
larger the domain, the smaller the distance. Also, if z is in the line segment [x,y], then 
d(x,y) = d(x, z) + d(z,y). For strictly convex region, there is a unique geodesic (with respect 
to this metric) connecting two points. It was Garret Birkhoff [63] and Hans Samelson [582], 
who independently first suggested to use the Banach fixed point theorem to prove the Perron- 
Frobenius theorem [536] [233] [234] stating that a positive matrix has a unique maximal 
eigenvalue. [443]. Birkhoff called it the projective metric. For that, one only needs the mere 
existence of a Hilbert metric and not the uniqueness. Uniqueness is shown in [413]. 


then is d(t,s) = |log(t/s)|. A positive matrix A = 


208. GROSS-ZAGIER 


The projective special linear group G = PSL(2,Z) is the group of integer matrices A of 
determinant 1 for which the matrices A and —A are identified. It is also called the modular 
group as its elements act as Mébius transformations z > T),4a(z) = (az + b)/(cz + d) 
on the upper half plane H Cc C. A congruence subgroup [ of G is a subgroup of G 
which has a principal congruence subgroup I'(NV), a set of matrices in G congruent to the 
identity matrix modulo M. The smallest N for which this happens, is called the level of I. 
An important example is the Hecke congruence group [o(V) = {7i.0,.ca, N|c}. A modular 
elliptic curve EF is a quotient H/T’, where [ is a congruence subgroup of the modular group. 
Elliptic curves are the simplest positive-dimensional projective algebraic curves that carry a 
commutative algebraic group structure. The set of rational points E(Q) is finitely generated 
by the Mordell-Weil theorem, so that E(Q) is isomorphic to Z" x T, where r > 0 is called 
the rank of E and T is a finite Abelian group called the torsion subgroup of E. The Birch 
and Swinnerton-Dyer conjecture claims that r is the order ord,-,(L(E,s) of L(E,s) at 
s = 1, where the L-function L(s) for an elliptic curve FE’ over K is an explicitly given Dirichlet 
series L(s) = \>~°, ann~*. [It can defined as follows: for a prime p let F,« denote the field with 
p® elements. Define t;(£) = 2, (EF) = p° +1 —|E(Fpe)| and the counting zeta function 


Coz) = expQ e341 ‘pe (#) 22) at p and 1g(p) = 1 if p does not divide N and 1g(p) = Oif p|N. Then 


e 
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¢(z) = (1 —t,(E)z + 12(p)pz”)~'. The L-function is then defined as the Euler product 
L(s) = [], prime Sp(P"*). While the Dirichlet series only converges for Re(s) larger than the 
abscissa of convergence, one knows from work in the 1970ies like Shimura that in the modular 
case, L has an analytic continuation to all of C.] The j-invariant j(7) is a modular function 
of weight zero on G. It can be explicitly written down and was originally used to represent 
isomorphism classes of elliptic curves. It is known that the field of modular functions is 
C(j). If 7 is an element of an imaginary quadratic field with positive imaginary part, then j(rT) 
is an algebraic integer by a result of Theodor Schneider from 1937. Now, a modular elliptic 
curve can be parametrized as r(z) = (j(z),7(Nz)) € C’, where N is the level of T. If w € A is 
a quadratic irrational number (a number of the form a+ bVD € H with rational a,b) which 
solves N Aw? + Bu +C =0 then w and Nw both have the same discriminant D = B? -4NAC 
so that P = r(w) € E(Q(D)). This P is called a Heegner point on EF [62]. The global 
canonical height function h : E — Risa function on F with the property that h(Q) = 0 if Q 
is a torsion point and such that the parallelogram law h(P+Q)+h(P—Q) = 2h(P)+2h(Q) 
holds for all pair of points P,Q on E. It is difficult to compute but the Gross-Zagier formula 
relates it in an explicit way with the order of the root at 1 of the function L: 


Theorem: The height h(P) of a Heegner point is a non-zero multiple of L’(1). 


This implies that if L’(1) = 0, then P is a torsion point and that if L’(1) £0, then the rank r 
of F is positive. Heegner points have been used to construct a rational point on the curve of 
infinite order. The theorem was later used to prove much of the Birch and Swinnerton-Dyer 
conjecture for rank 1 elliptic curves. illuminates the history of the theorem. 


209. SCHUR DETERMINANT IDENTITY 


The Schur determinant identity is an identity for partitioned matrices M = . D ; 
A 0 
1 


where A, B,C, D are all n x n matrices. Assume A is invertible, one can write M = 0 
| Using the Cauchy-Binet product formula, one gets the Schur 


1 0 i AB 
Gd 0 D—-CA'B 
identity 


Theorem: det(M) = det(A)det(D —CA7'B) 


The matrix D — CA~'B is called the Schur complement. Given two n x m matrices F,G 
one can compare the determinant of AB = : ri : ef | with the determinant of BA 
to get the Weinstein-Aronszajn identity det(1 + F7G) = det(1+G7F). See (653). 
This identity also follows from the formula det(1 + F7G) = >>, det(Fp)det(Gp) involving the 
summation over all minors [399]. (Compare that the classical Cauchy-Binet formula for n x m 
matrices F',G states det(F’G) = )>,det(Fp)det(Gp) which is a sum over all m x m minors. 
For n = m, it becomes the product formula det(F'G) = det(F’)det(G).) In [653] many more 
identities are listed like det(A + BC) = det(A)det(1+CA7'B) if A is invertible (which means 
especially det(A + B) = det(A)det(1 + A7'B) which is special case of the Schur identity for 
C = D =1) or det(A+ B)det(A — B) = det(B)det(AB~'A — B), if B is invertible. 
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210. HERMAN’S SUBHARMONICITY THEOREM 


If (Q, A, 14) is a probability space and T an automorphism and A € L™(Q, SL(2,C)) define the 
non-abelian Birkhoff product A”"(r) = A(T"~'z))A(T"*(z)) --- A(Tx) A(x). An example 
is when 2 is a 2-manifold and T an area- preserving diffeomorphism 2 — Q and A(x) = dT (zx) 
is the Jacobian. An other example is when (Lu)(n) = u(n+1)+u(n—1)+V(T"x)u(n) where the 


time equation Lu = Eu leads to the transfer matrix A(x) = a ee : | . Define A"(x) = 


0 
A(T"-1x)---A(T(x))A(x). The Lyapunov exponent (A) = ae + [, log ||A"(x)||du(x) 
exists because it is a limit of a sub-additive sequence. Assume z € C? > SL(2,C) is analytic 
in the sense that each matrix entry is an analytic function in each of the variables. Assume 
also that T’ : D¢ — D? is analytic in a neighborhood of the polydisc D? and maps the boundary 
Q = T? into itself and that T(0) = 0 and T preserves the Haar measure on Q. Herman’s 
theorem [312] is 


Theorem: \(A) > \(A(0)) = log(max(a(A(0)))) 


The reason is that z > log(||A"(z)||) is pluri-subharmonic so that the integral over the torus 
is bounded below by the Lyapunov exponent value at 0. For example, if p(z) = c(z+z271)/2 and 
T(z) = wz with w = e induces the dynamical system T(@) = 8 + a mod 27 on the boundary 
ccos(@) —1 
| 0 
then larger or equal than log(c/2). The reason is that the Lyapunov exponent of B(z) = 
— [| e(2?4+1)/2 -z 
zA(z) = » 0 


T!, then the Lyapunov exponent of A(#) = over the dynamical system is 


| is bounded below by the logarithm of the spectral radius of 


BO) = - : | = log(c/2). An other application is if A € D™(Q, SL(2, C)) is arbitrary and 


T : (Q,A, “) > (Q, A, ) is a dynamical system, then for Ag(x) = A(x) an 7 


the Lebesgue measure of values 6 with X(A(3)) > 0 is positive if A € SU(2,C) on some 
positive measure. This can be used to show that the set of A € L*®(Q, SL(2, C)) with A(A) > 
is dense [392]. The method of Herman has been extended in various way: use the Jensen 
inequality in complex analysis to show that for a non-constant real analytic f and A(x) = 
E-cf(z) 1 
—1 0 
exponent of A is positive for all F, if c is large enough. Herman’s and the Soret’s Spencer 
theorem are the starting point in [79]. 


and dynamical system T(x) = x+a mod 27 with irrational a, the Lyapunov 


211. GABRIEL’S THEOREM 


A quiver (V,£) is an other word for a multidigraph, a directed graph in which multiple 
directed connections = arrows and self connections = loops are allowed. The graph defined 
by (V, £) is the multigraph one obtains if the directions of the arrows are ignored. A repre- 
sentation V of a quiver assigns vector space over an algebraically closed field to each node 
x € V and a linear map V(x — y) : V(x) > V(y) attaching to each arrow x > y a linear map. 
It is indecomposable if it can not be written as the direct sum of smaller positive dimensional 
representations. A quiver is of finite type, if it has only finitely many isomorphism classes 
of indecomposable representations. The Quiver diagrams are formed by the simply laced 
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Dynkin diagrams A,,, D,, E¢, £7, Es. Gabriel’s theorem classifies the connected quivers of 
finite type. 


Theorem: Connected quivers of finite type correspond to quiver diagrams 


The theorem was proven by Peter Gabriel in 1972 [238]. Written in German, the article uses 
the word “Kocher" is used there for quiver. Peter Gabriel (1933-2015) was a French and 
Swiss mathematician also known as Pierre Gabriel. On Wikipedia, he is listed as a student of 
Alexander Grothendieck with a thesis done in 1960 on Abelian categories (on his personal 
website which is still active, Henri Cartan was listed as the Jury, and Jean Pierre Serre as 
the rapporteur, on the Mathematics Genealogy page, Jean-Pierre Serre is listed as the advisor. 
[According to Serre, Gabriel wrote an independent thesis and pointed out that in 1960, the 
advisor status had not been yet as formal as today. In the published article, it is also not 
visible who the formal advisor was.] Remarkably, Gabriel was doing his military service 1960- 
1962 just after finishing his thesis and the Abelian category paper was submitted in 1961. 
Gabriel worked at the University of Ziirich from 1974-1998. 


212. ZECKENDORF REPRESENTATION 


Let F(n) denote the n’th Fibonacci number. It is defined by the recursion F'(n + 1) = 
F(n)+F(n—1) and F(0) = 0, F(1) = 1, F(2) = 1. Given a positive integer n, a representation 
n= rpg F(c(k)) with c(k) > 2 and c(k+1) > c(k)+1 is called a Zeckendorf representation. 
The finite sequence ng = (c(0), c(1),..., c(m)) a notation of Knuth, this is called the Fibonacci 
coding of n. For example, 11 = (1010000); and 13 = (10000000). 


Theorem: Every positive integer has a unique Zeckendorf representation. 


Edouard Zeckendorf published this in 1972 and mentions to have proven it already in 1939. 
Lederkerker independently found the result in 1952 [117]. The proof of existence and uniqueness 
can both be done by induction. As Donald Knuth realized [406], the Zeckendorf representation 
of an integer leads to an associative multiplication xo y = ey pec F(e,(x) + ¢;(y)) for 
positive integers x,y. This is called the Fibonacci product. The proof of associativity is the 
realization that (roy)oz is equal to ga a y F(c;(x)+e;(y)+cn(z)). Knuth mentions 
that the Fibonacci product asymptotically satisfies xo y ~ V5zy and that the multiplication 
x*y = xy + [dz][¢2] by Porta and Stolarsky is asymptotically (1+ ¢?)mn ~ 3.62mn, where 
¢ = (1+ V5)/2 is the golden ratio. 


213. TURAN’S THEOREM 


A finite simple graph G = (V,£) has n = |V| vertices and m = |E| edges. A p-clique is 
a complete subgraph of G with p vertices. The 1-cliques can be identified with V and the 2 
cliques can be identified with E. Turdn’s graph theorem [662] is 


Ee ; 
Theorem: If m > =o then G has a p-clique. 
It assures that a triangle free graph can have at most n?/4 edges so that if a graph has more 
than a quarter of all edges connected, there must be a triangle in it. This is called Mantel’s 


theorem from 1907. The Turan graphs are graphs of the form P,, +... + P,, where for all nj, 
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we have n; € {a,a+ 1} for some integer a. For n; = n/(p— 1) are constant, these are graphs 
without p-cliques and B(p — 1,2)(n/(p—1))? = P=in/2. This shows that the result is sharp. 
[12] contains four short proofs, the first one doing induction with respect to n. See also [IJ] 
who states that the theorem of Turan initiated extremal graph theory and that the theorem 
had been rediscovered man8y times. 


214. THE SZPILRAJN-MARCZEWSKI THEOREM 


A finite simple graph [ = (V,F) is represented by a set of sets G if V = G and E = 
{(z,y) |c Ay,xNy #O}. The graph I is the connection graph of the set of sets G. 


Theorem: Every graph is the connection graph of a set of sets. 


An arbitrary set of sets is sometimes also called a multigraph. The theorem shows that from 
the point of view of connectivity, a multigraph can be studied by its connection graph. It 
does not encode other properties like subset property. The set of sets G = {{1,2}, {2,3}} and 
the set of sets H = {{1,2}, {2}} both have the same connection graph K2. The theorem was 
shown by Edward Szpilrajn-Marczewski (1907-1976) in 1945 [645]. The Polish mathematician 
was born Szpilrajn but changed his name while hiding from Nazi persecution. Erdés, Goodman 
and Posa showed in 1964 that one can realize any graph of n vertices as a set of subsets of 
a set with [n?/4] elements and that for n > 4, one can even require all sets to be distinct. 
The result is sharp for n > 4 The smallest number d(n) of sets needed to represent every 
graph with n vertices satisfies d(2) = 2,d(3) = 3) and d(n) = [n*/4] for n > 4. For example 
d(4) = [42/4] = 4 and d(5) = [5*/4] = 6. The Erdés-Goodman-Posa proof is done by induction 
n—+n-+2 and by first establishing the cases 4 and 5 which can be done by looking at all cases. 
The Szpilrajn-Marczewski theorem has been abbreviated SM theorem in and is a much 
referenced theorem in intersection graph theory. The theorem does not assume the graph 
to be finite. ‘ 


215. SAKAI THEOREM 


Let B(H,C) denote the Banach algebra of all bounded linear operators on a Hilbert space H. 
The commutant X’ of a subset X C B(H) is the set of all elements in B(H) that commute 
with every element in X. Because of the contra-variance condition X C Y = Y’ Cc X’, the 
bicommutants satisfy X” C Y” so that, using B(H)! = C,C’ = B(HA), any subset X is 
contained in the bicommutant X”. A subalgebra X satisfying X = X” is called a von- 
Neumann algebra. It is called a factor if its center X M X’ is C. Von Neumann showed 
the bicommutant theorem stating that X” = X is equivalent to X being weakly closed. 
(The weak operator topology means pointwise convergence in the sense that A, — A in 
the weak operator topology if and only if for every pair f,g € H one has (g, A, f) — (g, Af), 
meaning that given a basis in H that the matrix elements of operators converge pointwise. ) 
The bicommutant theorem is remarkable as it equates the algebraic bicommutant condition 
with the topological weak-closed condition. Von Neumann algebras can also be defined more 
abstractly using C* algebras without referral to operator algebras but the GNS construction 
justifies the more intuitive operator algebra definition. Like the bicommutant theorem, there 
are other characterizations of von Neumann algebras. One of them is Sakai’s theorem 


Theorem: A C* algebra is von Neumann if and only if has a pre-dual. 
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Sakai’s theorem was proven in 1956 [581]. Examples of von Neumann algebras are X = B(H), 
any finite dimensional subalgebra X of the algebra of operators B(#) or any algebra X = (SU 
S*)” generated by an arbitrary subset S of B(H). For example, every commutative von-Neuman 
algebra is of the form L®(Q, A, 1); the predual is then L1(Q, A, jz). Since L°(Q, A, jz) (acting as 
multiplication operators on H = L?(Q, A, j1)) for a measure ys completely encodes the measure 
theory of (2, A, 11), the theory of von Neumann algebras has been seen as non-commutative 
measure theory. This is the picture of Alain Connes . Von Neumann algebras are pretty 
well understood: each is a direct integral of factors. Factors are classified as type I (meaning 
that it has a non-zero minimal projection like operator algebras on Hilbert spaces), type II 
(meaning that there is a non-zero finite projection) or then type J/J (meaning that it contains 
no non-zero finite projection). There are other characterizations of von Neumann algebras: the 
Kaplanski density theorem states that if A is a C* subalgebra of an operator algebra B( 1) 
then the unit ball of A is strongly dense in the unit ball of the weak closure of A. This implies 
that a subalgebra VM of B(H) containing 1 is a von Neumann algebra if and only if the unit 
ball of M is weakly closed. More references are [5577] [66] [176] [670]. 


216. 'TAKENS’S THEOREM 


Let M bea d-dimensional manifold and T : M — M beasmooth map from M to M. A compact 
T-invariant set A C M is called an attractor for T if there there is an open neighborhood 
N of K& such that (),.,7"(N) = A. It is called a minimal attractor if no proper sub 
attractor exists. The map T' is called partially hyperbolic if the Lyapunov exponent 
AH) = limp+on! fi, log |dT"(ax)| d(x) is non-zero for some T-invariant measure py on A, 
where dT(x) is the Jacobian matrix. The partial hyperbolic attractor is called strange if it is 
not a countable union of lower dimensional sets homeomorphic to varietes in M. This happens 
for example if A is a fractal, meaning that the Hausdorff dimension of A is not an integer. A 
Takens embedding of M is given by a transformation T and a smooth C? function f : M > R 
and an integer k and defined as the time series x > (f(x), f(T(x)),..., f(T*!x)) C R*. One 
can often reconstruct M and so also the attractor A from such measurements. This happens 
Bair generically in C?(M, M) x C?(M,R). 


Theorem: For a Bair generic set of pairs T, f, a Takens embedding exists. 


One can therefore use a dynamical system T’: M — M to embed M into some Euclidean space 
R*. This Takens’s embedding theorem is analogue to the Whitney embedding theorem 
which assures that if f is allowed to be R™ valued, then A can be embedded in R™, even for a 
time series with k = 1 observation so that no dynamics is needed: f : M € R™ embeds M and 
so A into a Euclidean space. The significance of the Takens’s theorem is that one can “see" 
M or the attractor A using a time series of a single real observable f :  — R and then 
use time, that is the dynamical system, to generate the coordinates of the embedding. This 
is extremely practical. One can for example observe the times, when a drop leaves a faucet 
and use the differences of the times between two drops to create an attractor without having 
any model of drop formation. The time series of course does work in general as the functions 
f and the transformation TJ’ must be interesting enough. For the identity T for example, the 
time series does not give enough information. A special case is if A consists of a single point a 
which is hyperbolic in the sense that all eigenvalues of the Jacobian matrix dT'(a) are smaller 
than 1 in absolute value. In that case, the manifold M is the stable manifold of a and that 
remains true for an open set of transformations near JT’. Takens theorem then implies that for 
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a generic C? function f, one can chose a k such that the time series reconstructs the manifold 
M. The same works if A is a hyperbolic attractor, because the structural stability of T 
allows then to restrict the genericity statement to the function f. Floris Takens 1940 -2010 
was a Dutch mathematician. Together with David Ruelle, he introduced the notion of strange 
attractor. See [162] for dynamical systems in general. Takens’s article is in [357] (p 366-381). 


217. PERFECT GRAPHS 


A finite simple graph is called perfect if every induced subgraph has a chromatic number 
(minimal number of colors needed for a vertex coloring) which is equal to the clique number 
the graph. (The clique number is maximal number n of vertices for which there exists a complete 
subgraph K,, with that number of vertices). A finite graph satisfies the Berge condition, it 
none of the induced subgraphs are cyclic graphs C2,,; with n > 2 nor that it is the complement 
of such a cyclic graph. The strong perfect graph theorem states: 


Theorem: The set of perfect graphs is the set of Berge graphs. 


Because the odd cycle condition is invariant under graph complement formation, the following 
weak perfect graph theorem follows: if G is perfect, then its graph complement is perfect. 
Examples of perfect graphs are trees, bipartite graphs, wheel graphs with even boundary length 
or Barycentric refinements of graphs (the graph in which the cliques are the vertices and two 
cliques are connected if one is contained in the other, where obviously the dimension function 
is a coloring and agrees with the clique number). The strong perfect graph conjecture had 
been conjectured by Berge in 1961 [51]. Maria Chudnovsky, Neil Robertson, Paul Seymour and 
Robin Thomas proved the theorem in 2006 [127]. 


218. KOCHEN-SPECKER THEOREM 


Let H be a Hilbert space and let ¥ denote the set of self-adjoint operators on H. These 
operators A are also known as quantum mechanical observables. The mathematical frame 
work of quantum mechanics considers a time evolution ~ = 7Lw with a Hamiltonian CL and then 
does for A € &X produce data (w(t), Ay(t)) (Schrédinger picture) or (w, A(t)w) with A(t) = 
U(t)*AU(t) with unitary U(t) = exp(iZ) (Heisenberg picture). Since ¥ is non-commutative, 
one can not expect to do measurements as in the classical calculus. The non-commutativity is 
illustrated best with the famous anti-commutation relation [P,Q] = i which holds for the 
self-adjoint operators Pf (x) = if’(x), Q(x) = xf(x) on L?(R) which represent momentum and 
position of a particle on the real line. Before John Bell and Simon Kochen and Ernst Specker, 
it was not excluded that one could use some hidden variables and still be close to a classical 
theory. By formulating this precisely, one can also produce theorems. A function v: ¥ > R 
is called a classical value function, if it is linear ¥ and satisfy f(v(A)) = v(f(A)) for all 
continuous functions f as well as v(AB) = v(A)v(B). In other words, v is a multiplicative linear 
functional on ¥, honoring the functional calculus and being compatible with multiplication. 
For a continuous real function, the value f(A) is defined by the functional calculus which exists 
by the spectral theorem for any self-adjoint operator. The Kochen-Specker theorem is a no-go 
theorem: 


Theorem: If dim(H) > 3, there is no classical value function. 
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It was proven by Simon Kochen and Ernst Specker in 1967 even in the case when the 
dimension is 3 or higher and complements Bells theorem on “hidden variables". An important 
precursor was Gleason’s theorem. Kochen and Specker show more generally that there is no 
partial Boolean algebra D has no homomorphism into Zy. It is refreshingly simple and elegant 
especially, considering the difficulties that surround interpretations of quantum mechanics. A 
bit simpler is the argument if the dimension of H is assumed to be 4 or higher: let wy, we, u3, U4 
be four orthogonal vectors in H and let P, be the projection operators onto the line spanned by 
Ux. They satisfy P;+P2+P3+P, = 1 so that by linearity, v(P,)+v(P2)+v(P3)+v(P1) = 1. The 
condition v(AB) = v(A)v(B) implies for projections P (elements in 4 satisfying P? = P) 
that v(P?) = v(P) = v(P)v(P) so that v(P) = 0 or 1. The linearity condition now implies that 
exactly one value is 1. simplifies [535] uses the following list of 11 inconsistent equations 
for 20 vectors which can not be satisfied because each vector appears 2 or four times but on 
the left one has column sum which is 11 and so odd. 


1 = v({1,0,0,0]) + v([0, 1, 0,0]) + v([0, 0, 1, 0]) + u([0, 0, 0, 1]) 

1 = v([1,0,0,0]) + v([0, 1, 0, 0}) + v([0, 0, 1, 1]) + v([0, 0, 1, -1]) 

1 = v([1,0,0,0]) + v([0, 0, 1,0]) + v([0, 1, 0, 1]) + v([0, 1,0, —1]) 

1 = v({1,0,0,0]) + v([0, 0, 0, 1]) + v([0, 1, 1, 0]) + v([0, 1, —1, 0]) 

1 = v({-1,1,1,1)) + o((1, —1, 1, 1) + o((1, 1, —1, 1]) + o((1, 1, 1, —1J) 
1 = v([-1,1,1,1]) + o((1, 1,—1, 1]) + o((1, 0, 1, OJ) + v([0, 1,0, —1]) 
1 v((1, —1, 1, 1]) + o((1, 1, —1, 1]) + v((0, 1, 1, O}) + v([1, 0, 0, —1]) 
1 = v(fl,1,—1,1]) + o([1, 1, 1, —1]) + v((0, 0, 1, 1]) + v([1, —1, 0, 0}) 
1 = v({0,1,—1,0]) + o([1, 0,0, —1]) + o((1, 1,1, 1]) + o(f1, —1, -1, 1J) 
1 = v((0,0,1, —1]) + o([1, —1,0,0]) + o((1, 1, 1, 1]) + o(f1, 1, —1, —1]) 
1 = v({1,0,1,0)]) + ([0, 1,0, 1]) + o([1, 1, —1, —1]) + v([1,—1, -1, 1) . 


219. PERFECT DIFFERENCE SETS 


A subset D of Z is called a perfect difference set if every nonzero number in Z,, can be 
written uniquely as a — b for a,b € D. An example for m = 13 is D = {1,2,5,7} C Z3. For 
D to exist we need m = n?+n+1 and |D| = +1. The number n is called the order of 
the perfect difference set. Any perfect difference set D produces a finite projective plane 
P(2,n) with m = n?+n+1 lines. Singer showed in 1938 [610] that perfect difference sets exist 
if n = p* is a prime power: 


Theorem: For every prime power n = p* there exists a finite projective plane 


Singer obtained the perfect difference set in the following way: Let ¢ be generator of the 
multiplicative group in the Galois field G3 = F, which is a Galois extension of G; = Fyn, then 
¢ is the root of an irreducible cubic polynomial in G; so that every element can be written as 
a+ b€+cC?,a,b,c € Gy. Every element different from 0 in G3 can be written as ¢*. Look at 
all elements D = {k,¢* = a+ b¢ for a,b € G,}U0. Two such elements are called equivalent 
if one is the multiple of the other. The equivalence classes partition all numbers into n + 1 
equivalence classes. If they are written as a; +b;¢ = ¢’, then the set of exponents k; is a perfect 
difference set. The prime power conjecture claims that for any finite projective plane the 
order is a prime power. One already does not know whether there exists a projective plane of 
order n = 12. The prime power conjecture has been verified for all n < 20-10% by Gordon. 
Sarah Peluse recently showed that the number of positive integers n < N such that 
Zr24n+1 contains a perfect difference set is asymptotically N/log(V) giving more evidence for 
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the prime power conjecture. Perfect difference can be used to define Sidon sets if a+b =c+d 
for a,b,c,d € D, then {a,b} = {c,d}. Small sets typically are Sidon sets. Sidon sets D can not 
be too large as |D|(|D| + 1)/2 < 2n implies |D| < 2\/n. The set D = {(x,27),x € Zp} isa 
Sidon set in Zi, 


220. TRACE CAYLEY-HAMILTON THEOREM 


For a n x n matrix A, let pa(x) = det(A — 7) = S7p_9¢n—rv* denote its characteristic 
polynomial. The Cayley-Hamilton theorem p,4(A) = 0 assures that )>/_,Cn—4A* = 0. 
While obvious for matrices which allow diagonalization (like normal operators), the Cayley- 
Hamilton theorem is remarkably non-shallow. The trace Cayley-Hamilton theorem is 


Theorem: kc; + ys tr(A?)c,_; = 0 


This implies that if all trace powers are zero, then pa(z) = (—ax)". The reason for the name 
trace-Cayley-Hamilton theorem is that for k > n, the result can be obtained from the Cayley- 
Hamilton theorem ye j=0 Cn; A? by multiplying with A*~” and taking traces. The trace Cayley 
Hamilton theorem implies also that if two n x n matrices have the same traces tr(A*) = tr(B*) 
for k = 1,...,n, then A,B have the same characteristic polynomial and so are isospectral. 
This is extremely useful as computing the traces of n matrices can be more convenient than 
computing the characteristic polynomial. One can use the theorem especially in theoretical 
settings better. For normal matrices one can conclude that A is the zero matrix if tr(A*) = 0 
fork = 1,...,n. See [716]. For moment problems sce [589]. The Cayley-Hamilton theorem 
was first tackled in 1984 by William Rowan Hamilton in the context of quaternions, meaning 
for n = 2 complex or n = 4 real matrices. Arthur Cayley stated the theorem in 1858 for n < 3 
but only proved n = 2. In 1878, the general case was proven by Ferdinand Georg Frobenius. 


221. MAXIMAL PERMANENT 


The permanent of a n x n matrix A is per(A) = >>, [[_, Ain, where the sum is over all 
permutations of {1,2,...,n}. It takes the Leibniz definition det(A) = )°, sign(m) []j_, Ain 
of the determinant determinant but ignores the signatures sign(7) of the permutations. Unlike 
determinants which can be computed in polynomial times using row reduction, there is no 
polynomial way known to compute permanents in polynomial time. A probability vector 
p = (pi,---;Pn) is an element in R” for which all entries are in [0,1] and add up tol. Anxn 
matrix is doubly stochastic, if each row and each column of A are probability vectors. In 1926 
Bartel van der Waerden conjectured that the maximal permanent which a doubly stochastic 
n Xn matrix can have, is obtained if all entries are 1/n. These are the matrices with maximal 
entropy in the sense that the Shannon entropy S(p) = —)°;_, px log(px) is maximal for 
each column or row of the matrix. 


Theorem: Doubly stochastic maximal permanent < maximal entropy. 


The van der Waerden conjecture was proven in 1980 by Béla Gyires and in 1981 by 
G.P. Egorychev and by D.I. Falikman. In [286] it was pointed out that the conjecture had 
already been proven in 1977 [284]. For permanents, see [493]. Béla Gyires was a Hungarian 
mathematician who lived from 1909 to 2001. In his last paper [287], Gyires gives an other 
account on the proof of the van der Waerden conjecture and two proofs. 
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222. BILLIARDS IN POLYGONS 


A convex compact polygon in R? defines a billiard dynamical system. Parametrize the 
boundary by x € T = R/Z. Given (21,22) € T? where both points are not at a vertex of the 
polygon, we get a new point x3 such that the path x1, x2, x3 satisfies the law of reflection at x2. 
The set of points in T? for which no future point x; is a vertex has full measure. A point 29 
is called a periodic point if x, = x for some n > 0 and 2, are all points not on vertices of 
the polygon. It is unknown already in the case of an obtuse triangle, whether a periodic point 
exists. Fagnano already observed in 1775 that any acute triangle has a periodic trajectory, the 
orthopic triangle. A polygon is called rational if all angles a; have the property that the 
angles a,;/m are rational. 


Theorem: A rational polygon has a periodic orbit. 


Actually, there is a dense set of directions @ for which there is a periodic orbit. This is called 
the Masur theorem named after Howard Masur who proved this in 1986 by reducing 
the problem to flows defined by e’¢ where ¢ is a holomorphic 1-form on a compact Riemann 
surface R of genus > 2. More generally, if g is a holomorphic quadratic differential on such 
an R, there exists a dense set of @ such that e’’q has a closed regular vertical trajectory. The 
existence theorem uses Teichmiiller theory. The basic questions about billiards in polygons 
has been raised by Carlo Boldrighini, Michael Kean and Federico Marchetti in 1978 [104]. 

Billiards in polygons are also interesting from an ergodic point of view. A Bair generic polygon 
produces an ergodic flow. For rational polygons, this is not the case as the directions of the 
flow stay in a finite set generated by the rational angles 7n;/m,; at the vertices. There is then 
an interval [0,7/n) which parametrizes invariant hypersurfaces in the phase space. One knows 
that for Lebesgue all directions 6 € [0,7/n) the flow is uniquely ergodic, even weakly mixing 
but not mixing and has zero entropy. This implies that there exists a generic set of ergodic and 
even weakly mixing (non mixing) polygons (they are then non-rational) with n vertices. For 


more on billiards in polygons, see [281] |648} [288]. 


223. ELASTICITY 


If G Cc R” be an open and connected domain. For a vector field v : G > R” and a € 
G denote with u(r) = (v'(x),...,v"(x)) its coordinates. Let dui(r) = 0;v'(x) denote the 
Jacobian nes of v at ; a Miles denote the Sobolev norm obtained from the inner 
product (v,w)m = J, v(x ) + tr(du? dw) dx on smooth vector fields and let H1(G) be 
the Hilbert space obtained oe this set of vector fields with respect to that norm. 
Let (0;v7 + 0;v')/2 be abbreviated as dv*(x) = (dv" (x) + dv(x))/2 and denote the symmetric 
part of Jacobian matrix at x. Let ||v| a nee the symmetrized Sobolev norm obtained 
from the inner product (v, w) 71 = tees x) + tr(d°v7 (x)d*w(x)) dx). This means that the 
Hilbert-Schmidt product tr(AB) of | ‘ae matrices A = du and B = dw is replaced 
by the Hilbert-Schmidt product of the symmetrized Jacobian matrices d*v and d*w 


Theorem: There exists C = C(G) such that ||v||a1 < Cllv|| a1. 


This inequality is called the Korn inequality. It is used in linear elasticity and continuum 
mechanics. The constant C’ is called the Korn constant of G. The inequality had first been 
established by Arthur Korn in 1909 in the case G = R", where for smooth vector fields 
v, we have using integration by parts [,|d°f(x)|?dx = Jf. |df(x)|?/2+ JQ(div(f))? dx so that 
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the constant C = 2 would do. See [523]. The inequality has been generalized to W'(G) if the 
region is bounded with Lipschitz boundary. It has also been generalized to other Sobolev spaces 
W1(G) for p € (1, 00) if the boundary is smooth enough. It fails for p = 1,00. Arthur Korn was 
a German physicist born in 1870. He was also an inventor, involved in the development of the 
fax machine and Bildtelegraph which were early television systems, as well as a mathematician 
working on partial differential equations. He had been dismissed from his post in 1935 and left 
Germany to the US, working at the Stevens Institute of Technology in Hoboken. For more on 
the inequality, see [128]. 


224. TWIN PRIMES 


A pair (p,q = p+ 2) of two rational primes is called a prime twin. Examples are (p,q) = 
(5,7). One might have wondered since antiquity about the infinitude of prime twins. The twin 
prime conjecture claims that infinitely many prime twins exist. The first known source about 
the conjecture is Alphonse de Polignac in 1849 so that the conjecture is sometimes also called 
the Polignac conjecture. Let 72(x) denote the number of twin primes up to x. The sieve 
bound has first been established by Viggo Brun who showed 79(x) = O(x(log log(x)/log(zx))?. 


Let Lig(x) = fr —— and S = 2] ] rime >3(1 — 2/p)(1 — 1/p)~*. The sieve bounds theorem is 


Theorem: There is a constant C' with mo(x) < CSLig(z). 


The constant S has a probabilistic background. It is S = [J], prime Sp Where Sp = (1 — 1/2)(1— 
1/2)? = 2 and 2 = (1 —2/p)(1 — 1/p)~? for p > 3. One expects then from a probabilistic 


point of view a prime twin density of Sx/log?(x). The sieve bound implies that the sum 
tee = (1/3+1/5)+ (1/5+1/7)+--- ~ 1.902 of all reciprocals 1/p of all twin primes 
converges. The constant limit is called the Brun’s constant. In the context of the twin 
prime conjecture there is Chen’s theorem telling that there are infinitely many primes p 
such that p+ 2 has at most 2 prime factors. Zhang’s theorem from 2014 about the existence 
of infinitely many bounded gaps has been pushed further: there are infinitely many pairs (p, ¢) 


of distinct primes such that |p — q| < 246. See [472] for a recent review. 


225. AUCTION THEORY 


A real n X m signal matrix S = S;, for n buyers=bidders and m goods=merchandise 
encodes real signal values 5S; which buyer 2 can observe about the good k. Fixed also before 
hand is T;, the set of S matrix entries which buyer 7 can see for good k. Buyers have private 
values if they do not see what others do and common values if they do see all what others 
can observe about k. A valuation matrix V evaluates the signals relevant to the 7’th buyers 
evaluation V;, for good k. It defines a welfare of value system V;(S') = )> jer, Vii- Given a 
payment P,;(S), the utility is the difference U;(S) = V;(S)—P;(S) of value minus payment. 
A strategy ©; of buyer i consists of defining V(S') given the constraint T of what they can see. 
A pure strategy is a deterministic choice of V, meaning that buyers do not randomize. Given 
P defined by the auction, its expected utility is denoted by U;(X). A strategy U* is a Nash 
equilibrium if all buyers optimize their own utility, meaning that U;(%*) > U;(%) for all &. 
An auction with a Nash equilibrium is called effective if it is a Nash equilibrium for which U 
is a global maximum U. The problem is to find conditions and mechanisms which lead to 
Nash equilibria or even effective equilibria. The auction process consists of a bidding that 
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allows buyers to form a strategy to find the value V, an allocation process assigning goods 
to buyers according to V and then define a payment P leading to the utility U. The goal is to 
find an auction process which leads to an effective Nash equilibrium. A Vickrey auction is an 
auction process for private values and one good, a Vickrey-Clarke-Groves auction (VCG) 
extends this to several goods. 


Theorem: There is a VCG bidding leading to an effective Nash equilibrium 


Auction theory is a chapter in game theory and is part of mathematical economics. It deals 
with the problem to use a bidding setup to allocate goods among buyers who bid for a fair prize. 
It is a way to discover a correct price for a good. Game theory started with von Neumann’s 
paper of 1928. Von Neumann and Morgenstern developed it in their book in 1944. The 
concept of Nash equilibrium was introduced by John Nash in 1950 (see e.g. [468]). In game 
theoretical settings, this means that players choose strategies from which unilateral deviations 
from the strategy do not pay better. The Vickrey auction from 1961 in which "the highest 
bidder wins but pays the second highest bid", is a private auction where each person’s bid only 
depends on its own value. The theory has shown to be so valuable that Vickrey was awarded 


a Nobel prize in economics for his work. See [487]. 


226. WIENER’S 1/F THEOREM 


The Wiener algebra A(T) is the set of continuous 27-periodic functions f with absolutely 
convergent Fourier series f(x) = > ¢z Cne'"*. Equipped with the norm ||f|| = 0 ,¢7 |enl? < 
oo, it is a commutative Banach algebra, meaning ||f - || < ||f]|- |/g||. It is not a C* algebra 
although, one would have to change the norm to the supremum norm which then completes 
to the larger set C(T). The algebra consists of mildly regular continuous functions because 
C°(T) c A(T) C C(T) for all a > 1/2. The Fourier transform f € A(T) > f € [(Z) is an 
isomorphism of Banach algebras. Wiener’s 1/f theorem is 


Theorem: f € A(T) and f(x) 40,Vz = 1/f € A(T). 


Wiener proved this in 1932 ([695], Lemma Ile). In the theorem is called “one of the nicest 
applications of the theory of Banach algebras to harmonic analysis". It is also known as 
the Wiener-Lévy theorem as Paul Lévy extended the result showing that for any function 
@ that is analytic on the image of f, the function ¢(f(x)) is in A(T) [445]. Lévy gives the 
example f(x) = 1/log|sin(x)/2| which has Fourier coefficients c, ~ 4/(nlog?(n)) and so 
has an absolutely convergent Fourier series. Its derivative f’(x) = —f?(x) cot(x) is no more 
continuous. The 1/f theorem was proven by Israel Gelfand in 1939 using the structure theorem 
for commutative Banach algebras [720]: it uses the fact (actually a lemma in Gelfand’s 
first paper on normed rings in 1939) that in order that an element in a normed ring has an 
inverse, it is necessary and sufficient that it is not in a maximal ideal: (if f has an inverse then 
it does not belong to any maximal ideal J as then 1 € J and so every g € J; on the other hand 
if f does not have an inverse then J = {gf, f € A(T) } is an ideal which is not the entire ring 
and so is contained in a maximal ideal different from the ring). It also uses that the maximal 
ideals in A(T) are the set of functions f which vanish at some point to € T. So, functions 
which do not vanish are not contained in a maximal ideal and so are invertible. A short direct 
proof is given by Donald Newman in [516] using the inequality |flo < || fll < [flo + 2|flo 


(which holds for differentiable f and in particular for finite partial Fourier sums): if f € A(T) 
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is given which is nowhere 0, scale it so that |f(x)| > 1 and take a partial sum P such that 
||P — f|| < 1/3. Now look at the geometric sum S(x) = S°°°.,(P(x) — f(x))""1/P” which 
converges because P(x) > 2/3 and ||1/P"|| < (3|P’|.. +1)(3/2)”. Because the geometric series 
converges to S(x) = 1/P(1/(1 — (P — f)/P) =1/f (a), the theorem is proven. 


227. WELL ORDERING THEOREM 


A set X is called well-ordered if there is a total order on X such that every non-empty subset 
Y Cc X has a least element. [A total order is a binary relation < that satisfies antisymmetry 
(a < b and b < a implies a = b), reflexivity x < x) transitivity (a < b and b < c implies 
a < c) and connexity (a < b or b < a). Without connexity, one only has a partial order. 
The least element of a set Y in a totally ordered set is an element y € Y such that y < z 
for all z € Y.| The well ordering theorem is like Zorn’s lemma or Tychonov’s theorem 
equivalent to the axiom of choice and leads to seemingly paradoxa like the Banach-Tarsky 
paradox telling that one can partition the unit ball in R® into 5 disjoint sets such that three of 
them can be translated and rotated to become the unit ball again and the 2 remaining can be 
translated and rotated to become the unit ball again which is a paradox because the doubling 
of the ball is incompatible with volume. 


Theorem: Every set can be well ordered. 


The theorem was suggested by Georg Cantor in 1883 [106] and proven first by Ernst Zermelo 
in 1904 who called it the “true fundament of the whole theory of number". The integers Z 
can be well ordered (we write << to distinguish from the <) for example with x << y if 
|x| < |y| or |z| = |y| and x = y orz < 0,y = —xz > 0. By using a bijection from Q to Z, 
also Q can be well ordered as such. But this does not work for R any more. Koenig in 1904 at 
the Heidelberg Congress, gave on August 9th a wrong proof that the real numbers can not be 
well ordered. Already on August 10th, Zermelo pointed out an error in K6nig’s argument. It 
was Felix Hausdorff who found an essential problem on September 1904 in a letter to Hilbert 
and also Cantor pointed to a problem. Hilbert, Hensel, Hausdorff and Schoenfliess had met in 
Wengen (in the Swiss alps) at a successor congress. Konig then in October 1904 revoked his 
Heidelberg proof. On September 24 1904, Zermelo found the proof of the well-ordering theorem 
in Miinden, near Gottingen and acknowledges Erhard Schmidt for the idea, to base it on the 
axiom of choice. The letter is printed in [189], where it also pointed out that the proof was 
object of intensive criticism which only ebbed after decades. While the fact that R can be well 
ordered is a consequence of the well ordering theorem, it is impossible to explicitly construct 
such an ordering without assuming an axiom of constructibility. See [189] (section 2.5, 2.6) 
or [188]. The well ordering theorem is now known in the mathematics history as one of the 
greatest mathematical controversies of all times. 


228. CARISTI-KIRK-EKELAND THEOREM 


Let (X, d) be acomplete metric space and T : X — X an arbitrary map, not necessarily con- 
tinuous. The map T’ satisfies an inward condition if there exists a lower semi-continuous 
function f(x) > 0 such that d(x#,T(x)) < f(x) — f(£(x)). |A function f is lower semi- 
continuous if limits only can “jump down" that is if f(a) < lim,., f(x) for every a. For 
example f(x) = —1 for <0 and f(x) =1 for x > 1 is lower semi-continuous but not contin- 
uous. f is lower semi-continuous if and only if —f is upper semi continuous. If f is both 
lower and upper continuous, then f is continuous.| 
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Theorem: If 7 satisfies an inward condition, then T has a fixed point. 


An example is if we take y € X and f(x) = d(x,y). The condition then means d(x,T(x)) < 
d(x,y) — d(T (x), y), implying d(T(x),y) < d(x,y) if « # T(x)), justifying the name “inward 
condition". In general, the condition means f(T(x)) < f(x) — d(a,T(x)) < f(x) as long as 
x # T(x). The sequence xz, = T"(x) has the property that y, = f(rn) > 0 is decreasing, 
and so some sort of Lyapunov function. By completeness, the sequence y, then must have an 
accumulation point y, which is a fixed point of 7. The theorem is easier to see if f is continuous 
because there exists then by completeness of X an element x with f(x) = y and f(T(x)) = f(x) 
so that f(T'(x)) = f(x). James Caristi [109] (Theorem 2.1’) mentions that the theorem was 
suggested by Felix Browder and that I. Ekeland has proven an equivalent theorem in 1972 ([203] 
Theorem 1) as an abstraction of a lemma of Bishop and Phelps. W.A. Kirk, who was the PhD 
advisor of Caristi, proved already a related theorem in 1965 [384]: Kirk assumed that X is a 
bounded convex subset of a reflexive Banach space with a normal structure [for every convex 
subset H of X with more than one point, there is a point that is not a diametral point. A 
diametral point in H is a point x which appears as in the supremum sup,¢y||2 — y|| being 
the diameter of H] and that T does not increase distances. Caristi’s statement is more general 
and elegant in comparison with the results of Ekeland and Kirk who were more concerned with 


convex analysis 


229. SHAPLEY-FOLKMAN THEOREM 


The Minkowski addition of two subsets A, B in V = R¢ is defined as the set {a+b | a € A,b€ 
B}. Aset A in V is called convex, if for any two points x,y € A also the connecting interval 
points {x+t(y—2),t € [0,1]} is part of A. The convex hull of a set A is the smallest subset 
c(A) in B which is convex. A set A is convex if and only if the Minkowski distance d(A, c(A)) 
of A to its convex hull c(A) is zero. One has in general the relation c(A + B) = c(A) + c(B). 
Let us call a sequence of sets A, uniformly bounded if there is a ball B = B, such that A, are 
all subsets of B. Define the Minkowski average S,, = > A,. The Shapley-Folkman 
theorem is: 


Theorem: If A, are uniformly bounded then d(S;,, c(.S,)) > 0. 


This is some sort of a law of large sets in the sense that the Minkowski average converges 
to the “average" which is the convex hull. There are uniform bounds for the distance which do 
not depend on n as long as n > d. For convex analysis, see [204]. For convexity in economics, 


see [270]. 


230. DIRICHLET’S UNIT THEOREM 


An element r in a ring R is called a unit if it has an inverse. The units form a group called 
the group of units. In a division ring R, it is the multiplicative group R \ {0}. | In a normed 
division algebra R one sometimes calls the elements of norm 1 “units" in the sense that they are 
elements of norm 1. Units here are all invertible elements in a ring. | A algebraic number 
field is an algebraic field extension of the field of rational numbers Q. Let Ox be the ring 
of integers of the number field kK. Its degree is the dimension of K as a vector field over 
Q. For quadratic fields Q(Vd) for example, the degree is 2 if the integer d is not a square 
integer. The field K is the field of fractions of Ox. Dirichlet’s unit theorem tells: 
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Theorem: The group of units in a ring of integers is finitely generated. 


The rank r of a ring is the maximal number of multiplicative independent elements in the 
group of units. It is r = 1r,+r2—1, where r; is the number of real embeddings (the number of 
real conjugates of a primitive element) and 2r2 is the number of conjugates which are complex. 
For a ring of integers in a real quadratic field like Z[,/5|, the rank is 1. In an imaginary 
quadratic field like the ring of Gaussian integers Z[/—1], the rank is 0. For all other fields, 
the rank is larger than 1. For algebraic number theory [436] [512] [631]. A relatively short proof 
of the unity theorem can be found in [626]. 


231. SPECTRUM OF A COUNTABLE THEORY 


Model theory investigates how a formal theory build by sentences in a formal language is 
modeled and interpreted in concrete structures. A theory T a set of sentences in a language 
L. A model M of T is a set with interpretations of functions, relations, symbols in that language 
L. A model is complete if every substructure of a model of T’ which is a itself a model of T 
can be axiomatized in first order logic. The spectrum of a complete theory T' is the number 
I(T,k) = I(k) of isomorphism classes of models as a function of the cardinality k. The 
Léwenheim-Skolem theorem tells that if [(7,k) > 0 for some countable k, then I(k) > 0 
for all cardinalities k. First order logic theories can therefore not control the cardinality of their 
models. The theorem also shows that a theory with arbitrary large finite models must have 
an infinite model. A theory T is called k-categorical if (Tk) = 1 has only one model up to 
isomorphism. Lowenheim-Skolem shows that a first order theory with an infinite model is not k- 
categorical. This also follows from Gédels incompleteness theorem. Michael Morley conjectured 
in 1961 that /(7,k) = 1 for some uncountable & then J(T,k) = 1 for all uncountable k. In 
other words, if a theory is k-categorical for an uncountable power k, then it is k-categorical for 
every uncountable power k. It was proven in 1965 by Morley [500] and called the Morley’s 
categoricity theorem. 


Theorem: /(7,k) =1 for k uncountable = /(T,k) = 1 V uncountable k. 


The theorem is remarkable in comparison with the Lowenheim-Skolem theorem which tells 
that a theory in a countable language has an countably infinity model, then it has a model of 
any infinite cardinality. The categoricity theorem is considered the beginning of modern model 
theory. Michael Morley who died on October 16, 2020 had won the 2003 Steele prize for seminal 
contributions to research for his paper [500] which had been initiated when writing this PhD 
thesis in 1962. See [464] chapter 6 and [600]. 


232. PEANO AXIOMS 


The Dedekind-Peano axioms (PA) formalize the arithmetic of the natural numbers N. The 
axiom system first lists five axioms that are already true in first order logic with equality. The 
next three axiomatize the successor function S: 1) for every n € N, there is a successor 
S(n). 2) S is injective and 3) there is no n with S(n) = 0. And then there is the axiom of 
induction: 4) if K is a set such that 0 € K and n € K implies S(n) € kK, then NC K. Not 
all statements which are true for integers can be proven by the Peano axioms. Already Kurt 
Gédel established the existence of statements in PA that are true but unprovable within PA. 
An accessible and natural example has been given by Jeff Paris and Harrington [531]: a finite 
set H C N is called relatively large if card(H) > min(H). Given a finite set M Cc N and 
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e,r,k € N let F(M,k,r,e) denote the statement: for every coloring map P : M® > {1,...,r} 
(producing a partition of M°), there is a relatively large H C M with card(H) > k on which 
P is constant (there is only one color on H). The extended finite Ramsey theorem is “for 
all e,r,k € N, there exists M such that F(M,k,r,e) holds". 


Theorem: The extended finite Ramsey theorem is not provable in PA. 


Paris and Harrington point out that when working with natural numbers, working in PA 
amounts of replacing the axiom of infinity by its negation in ZF. They then give first a proof 
of the extended finite Ramsey theorem as follows. (We write N for w, the order type of N): fix 
e,r,k and assume there is no such M. Let P: M — {1,...,r} be the counter example map. 
There is no relatively large homogeneous set of size at least k. The set of counter examples 
is a graph where (P, M),(Q, N) are connected if MC N and P is the restriction of Q to M. 
This is an infinite tree with finite vertex degree at every point. By Ko6nig’s lemma, there is 
P:N* > {1,...,r} such that for every M CN the restriction of P to M° is a counter example 
for M. By the infinite Ramsey theorem, there is an infinite H C N that is homogeneous for P. 
By choosing M large enough (compared to k, min(H)) HMM isa relatively large homogeneous 
set for P|M* of size at least k. This finishes the proof of the extended finite Ramsey theorem. 
The proof of the Paris-Harrington theorem uses model theoretic techniques and the Gédel’s 
incompleteness theorem. Paris and Harrington define a “beefed up" theory 7’ and show that 
the consistency of T implies the consistency of PA using PA only. Then they show that the 
extended finite Ramsey theorem implies the consistency of JT and so the consistency of PA. 
This contradicts Gédel’s incompleteness theorem: one can not prove the consistency of PA 
within PA. Laurence Kirby and Jeff Paris have produced even more accessible examples [383], 
especially the Hydra game: which is a game in which the player has to cut off heads of a 
tree to which the tree reacts by growing a multiple copies of branches. The theorem is that the 
player always wins. The surprise is that one can not prove this within PA. More examples are 


in [618]. 
233. SIMPLICIAL SETS 


The simplex category A has simplices |n| = {0,1,...,n} (non-empty totally ordered finite 
sets) as objects and order preserving maps between them as morphisms. A simplicial 
set is a contravariant functor A — Set. Simplicial sets form a category called sSet. More 
generally, a simplicial object is a contravariant functor from A to an other category C. A 
coface map d' : {0,...,n} — {0,...,2+1} is the unique order preserving bijection for which 
element 7 is omitted in the codomain. The codegeneracy map s’ : {0,...,n+1) + {0,...,n} 
duplicates the element i meaning that s‘(j) = 7 if0 <j <iand s'(j) =j—-lifi<j<n. 
There is now a decomposition lemma: 


Theorem: A morphisms are a composition of coface and codegeneracy maps. 


This simple lemma is important to appreciate the axiomatic description of simplicial sets given 
first by May in 1967. See [802] 561]. It tells that every morphism: f : {0,...,n} — {0,...,m} 
has a unique representation f = d’*--.-d‘s/1.-. 93» withntk—-h=mandm>i,>+-->%,> 
0 and 0 < 3, <--+ < j,, <n. One has the relations d'd? = di+'d' and s/s' = s‘s!*+ for i <j 
and s'd? = d's)—! fori < j, sid’ =1ifi = j,7+1 and s‘d' = d’“'s/. In the opposite category 
A” (the category with the same objects but reversed morphisms), the morphisms are denoted 
by d;,s; and called face and degeneracy maps. All morphisms in A’? are now generated 
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by composites of d;,s;. It follows that a contra-variant functor X from A to C is determined 
by the images X{1,...,n} of the simplices and if the face and degeneracy maps d; and s; are 
known. A simplicial set therefore is a set of sets X,, together with functions d; : X, 7 Xn_1 
and s;: X, — Xn+1 satisfying the composition relations for d;,s;. The elements Xo are called 
the vertices, the elements X;, are called the k-simplices. The image of some s; is called a 
degenerate simplex. An advantage of looking at simplicial sets rather than the simplicial 
complexes is that one can use the frame work in any category and that the Cartesian product of 
simplicial sets is a simplicial set. The covariant geometric realization functor X — |X| from A 
to Top is the right adjoint to the singular homology of the theory of simplicial sets. An other 
example is that the nerve N(C) of a small category C is a simplicial set constructed from the 
objects and morphisms of C’. A functor f :C — D between two categories induces then a map 
of the corresponding simplicial sets and a natural induced transformation between two functors 
induces a homotopy between the induced maps. The geometric realization of NC is called the 
classifying space of C. In general, for locally finite simplicial sets one has |X x Y| = |X| x|Y| in 
Top and the geometric realization |X| of a simplicial set X in Euclidean space is a CW complex. 
In 1950, Eilenberg and Zilber introduced semisimplicial complexes (see [595]), a terminology 
which later morphed into simplicial sets. According to [802], every simplicial complex can 
be subdivided to become a simplicial complex so that every simplicial set is homeomorphic to 
a simplicial complex. But similarly as with CW complexes, simplicial sets allow computations 
with fewer simplices. The category of topological spaces of homotopy type of a CW complex 
are equivalent to the category of simplicial sets which satisfy an extension condition. For more 


literature [471] [253] [453]. 


234. OSTROWSKI THEOREM 


An absolute value on Q is a norm function from x € Q => |z| € R* with the property that 
|x| = 0 is equivalent to x = 0, and which is compatible with multiplication |ay| = |a||y| and 
satisfying the triangle inequality. The trivial norm is |x|, = 1 for « 4 0 and |0|; = 0 is 
not considered. An example is the usual absolute value |x|. or the p-adic absolute value 
|z|p = |p"=|p = p-” if p,u,v are all coprime numbers and where p is a rational prime. The 
p-adic norm |z|, is a non-Archimedean absolute value in the sense that the ultra metric 
property |x+y| < max(|z|, |y|) holds. Two different absolute values are equivalent if |a|, = |a|§ 
for some positive constant c. An equivalence class of non-trivial absolute values on Q is a place. 


Theorem: Every place |x| is either |x| = |z|.. or |x| = |x|, with prime p. 


Alexander Ostrowski proved this theorem in 1916 [524]. It shows that every field containing 
Q which is complete with respect to an Archimedean absolute value is either R or Q. An 
other curious consequence is the product formula [[,-,,|z|p = 1 for x € Q \ {0} which 
combines all possible norms |z|, as well as |x|... The completion of Q with respect to |z|, 
is the space Q, of p-adic numbers. Each number in Q, can be written in a unique way as 
half infinite Laurent series r = )-~°_. agp”, where the a, € {0,...,p —1} are zero for 
k < n(x). The norm is then |z|, = p~™. The field Q contains the sub-ring Z of p-adic 
integers z = )77°.y ayp* which is the unit ball in Q,. The ring Z of p-adic integers has no zero 
divisors so that Q, is the field of fractions of Z, (the smallest field containing Z,,). The p-adic 
integers G = Z, with addition and metric coming from the norm forms a commutative compact 
topological group with respect to addition. It is a totally disconnected perfect metric space 
and so a Cantor space, a topological space homeomorphic to the Cantor set. Its Pontryagin 
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dual group G is the p-Priifer group Q,/Z», the p-adic rational numbers modulo the integers 
which is G = {e27**/"' 1 © Z}. While R and Q, are all locally compact, the circle T= R/T 
is a compact, connected metric space which is the dual to the integers Z which is non-compact 
and completely disconnected. The p-adic integers Z, are all compact, completely disconnected 
spaces, dual to the Priifer p-group P,, = Q,/Z,. In both cases, one has Haar measures 4 on 
G and a Fourier isomorphism f € L?(G, 1) > L?(G, ji) which is the Pontryagin involution. All 
group translations and multiplications by some integer preserve the Haar measure jz. While for 
T, the translations x + x + a preserving dx can be arbitrarily close to the identity, there is a 
smallest group translation x + T(x) = x +1 on the p-adic integers. It is called the adding 
machine and is ergodic. The eigenvalues of the unitary Koopman operator f > f(T) on 
L?(Z,) coincides as a set with the Priifer group P, c T = {z € Z,|z| = 1}. Besides group 
translations, there are also Bernoulli shifts in both cases which preserve the Haar measure. On 
the compact topological group T = R/Z, the map x > nz is a Bernoulli shift for every n > 1 
with entropy log(n). On the compact topological group Z,, the map x — pz is a Bernoulli 
shift with Kolmogorov-Sinai entropy log(p). About the life of Ostrowski see [242]. For p-adic 
analysis [260]. 


235. CLIFFORD ALGEBRAS 


If V is a finite dimensional vector space over a field k and q is a quadratic form of signature 
(p,q), its Clifford algebra Cl(V,q) is the quotient T(V)/J of the tensor algebra T(V) = 
Dro Wr V by the ideal J generated by elements v ® uv — q(v)1. [We use the sign convention 
of [125] [613]. Other authors, like [240] 439], prefer to take the ideal v®vu+q(v)1.| The Clifford 
algebra is a unital associative algebra. If the underlying field k is R, it is called a geometric 
algebra. For q = 0, one obtains the exterior product with exterior multiplication, where 
v@w=-—w®v. As turning on q deforms the anti-commutativity relation, one sees Cl(V, q) 
as a quantization of the exterior algebra Ext(V). (One can also see the process of going 
from (V,q) to Cl(V,q) as a second quantization if one interprets the tensor algebra as a many- 
body Fock space.) Examples of Clifford algebras are the complex numbers Clp;(R) = 
C the quaternions Clo (IR) and split complex numbers Cl, 9(R) or split quaternions 
CL 9(R). Notable in relativity is the space time algebra Cl, 3(R). Let i: V — CI(V,q) 
be the inclusion map which embeds V into Cl(V,q) and which satisfies i(v)? = q(v). The 
algebra C'L(V, q) enjoys now a universal property if given any associative algebra A and any 
linear map j : V + A obeying j(v)? = q(v): there exists a unique algebra homomorphisms 
f : ClV,q) > A such that j(v) = f(i(v)). The fundamental theorem of Clifford algebras 
is that CL(V,q) is unique. One can speak of “the" Clifford algebra Cl(V,q) defined by V 
and q. 


Theorem: The Clifford algebra construct satisfies the universal property. 


In category theory one sees Cl as a functor from the category of finite dimensional pseudo 
Hilbert spaces (V,q) to the category of unital associative algebras. The universal property 
generalizes the process of getting from an algebra to the free algebra F(A) or to get the 
tensor algebra T(/) from a module M over a ring [125]. If V has dimension n, the dimension 
of the Clifford algebra Cl(V,q) is 2”. As a vector space, the Clifford algebra is isomorphic to 
the exterior algebra Ext(V) which is like Cl(V,q) a super algebra. This follows from the 
universal property. As the involution v — —v does not change the quadratic form q, it lifts 
to an involution a on Cl(V,q). This produces a splitting Cl(V, q) = CIV, q)°%™ ® CLV, q)°*4, 
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where CI(V, q)%" = {x € Cl(V,q), a(x) = x} and CL(V,q)°4 = {x € CL(V,q), a(x) = —2}. 
Multiplication honors this grading. The quadratic form q can be extended from V to C1(V, q): 
first define the transpose x’ which reverses the order x = v; ® ++: @ vp Vg @+*- QU, 


and the scalar part x of an element x = 5°, Winjen LEVE, © ++ @ Ug, The symmetric, 


bilinear form q; on V; = Cl(V,q) is then defined as x - y = (x7 - y)o and continues to be non- 


degenerate if g was. In the case when q is positive definite, where (V,q) is a Hilbert space, 
the operation (V,q) > (Vi, q:) can now be iterated and produces a sequence (V;,, dn) or Hilbert 
spaces. Clifford algebras have many applications, like in algebraic geometry (starting with 
Grassmann who introduced exterior algebras), in representation theory of classical Lie groups 
(it was Elie Cartan who discovered in 1913 first unknown representations of the orthogonal 
group and called the elements on which the matrices operated “spinors" [112]), in physics or 
in differential geometry. To the later, Cartier writes in the introduction to that since the 
1950’s, spinors and the associated Dirac equation have developed into a fundamental tool in 
differential geometry. Indeed, on has at every point x € M of a Riemannian manifold a Clifford 
algebra Cl(T,M, g(x)) defined by g(x), the quadratic form in the tangent space V = T,,M. This 
produces a Clifford bundle. One can then ask, under which conditions a spin structure 
exists on M. It is the case if and only if the second Stiefel-Whitney class w2() is zero. 
This topological obstruction for the existence of spin structures on an orientable Riemannian 
manifold (M,g) was found by André Haefliger in 1956. Haefliger defined the spin structure 
on (M,qg) as a lift of the principal orthonormal frame bundle F's9(M) > M to Fpin(M). 
Not every Riemannian manifold is spin. While spheres S” are spin, the 2n-manifolds CP?” 
(complex projective spaces) are not spin. The space of spinors of (V,q) is the fundamental 
representation of a Clifford algebra Cl(V, q). Spinors belong also to vectors in a representation 
of the double cover Lie algebra Spin(p, ¢) of the special orthogonal group SO(p, q) of signature 
p,q. Representation theory of classical groups like SO(n) or Spin(n), a subgroup of the group 
of invertible elements in a Clifford algebra of a Hilbert space, are a major motivator for Clifford 
algebras. 


236. TRANSCENDENTAL NUMBER THEORY 


A complex number is called algebraic if it is the root of a polynomial ag+a,x%+---+a,2" € Zz] 
with integer coefficients ag,...,@, € Z. The algebraic numbers A form a field. They can be 
enumerated and so are a countable set in C. As a consequences, almost all real numbers are not 
algebraic. This argument of Cantor is a non-constructive but elegant proof of the existence of 
non-algebraic numbers, numbers in the complement C\ A which are also called transcendental 
numbers. Let us call a Gelfond-Schneider pair a pair of algebraic numbers a, 3 for which 
a #0,1 and for which is rational. The Gelfond-Schneider theorem is: 


Theorem: (a, 3) Gelfond-Schneider > any choice of a® is transcendental. 


The theorem is named after Alexander Osipovich Gelfond and Theodor Schneider. Gelfond 
proved a special case in 1929 (3 imaginary quadratic) and the full version in 1934. Schneider 
proved the same in his PhD thesis in 1934 written under the advise of Carl Siegel, who already 
proved it for real quadratic 6. An example is the Gelfond-Schneider constant gv2_ An 
other example is Gelfond constant e” = (e’")~' = (—1)~*. One has to say “any choice" 
because (—1)~* invokes the complex logarithm which has many branches. Any other branch 
like (—1)~? = e~*lee(-1) = e-™+2km — ¢-(142k)m jg also transcendental. A third example is the 
“eye for an eye" number i’ = (e’"/*)' = e~7/? which is already transcendental as a consequence 
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of the Gelfond theorem because 7 is algebraic, solving the equation 1 +2? = 0. The problem 
whether a° is transcendental for a Gelfond-Schneider pair had been asked by David Hilbert 
and got to be known as Hilbert’s seventh problem in 1900. Questions about transcendental 
numbers are difficult. For example, one still does not know, whether 7° is transcendental or 


not. See [256] and especially chapter 4 and 5 of [100]. 


237. CONFORMAL MAPS 


Let D,(w) = {|z — w| < r} denote a disk of radius r centered at z € C. If f: D,(w) 9 C 
is a holomorphic function with f’(w) 4 0, then by the implicit function theorem, the map 
is invertible and by Bloch’s theorem, f(D,(w)) contains a disk D(f(w),c|f’(w)|r) for some 
constant c. The best constant c for which this works is is called the Bloch constant. An 
analytic, injective function f on D,(w) is also called univalent and f : D,(w) > f(D,(w)) is 
called a conformal mapping. Koebe’s quarter theorem is 


Theorem: If f is univalent on D,(w), then Diprqwyjrja(w) C f(D,(w)). 


The result had been conjectured in 1907 by Paul Koebe and was first proven by Ludwig Bieber- 
bach in 1916 [58]. The Koebe function f(z) = z/(1—z)? = S>°°, nz” shows that the constant 
1/4 can not be improved upon. In [654], the result is stated and proven that for polynomials 
of degree n, the image f(D,(w)) contains the disk Dj pr(w)\ryn(f(w))- (The blog cites but 
this the disc result is not that obvious there). For more information on Koebe, see [L10]. Paul 
Koebe’s is famous also for a his theorem generalizing the Riemann mapping theorem. It 
states that any finitely connected domain is conformally equivalent to a circle domain unique up 
to Mobius transformations. (A circle domain is an open subset of C such that every connected 
components of its boundary is either a circle or a point.) Koebe’s Kreisnormierungsproblem 
from 1909 asks whether every domain in C is conformally equivalent to a circle domain unique 
up to a Mobius transformation. The problem is open. 


238. SHANNON CAPACITY 


The independence number a(G) of a finite simple graph G = (V, E) is the maximum number 
of independent points in G (a set of vertices is independent if the members of the set are 
pairwise not adjacent). The strong product G « H of two graphs G,H has as the vertex 
set the Cartesian product V(G) x V(H) of vertices in G and H and as edges all connections 
which when projected on any of the graphs gives either a vertex or edge. In communication 
theory, where V is the alphabet and E gives letters which can be confused, then a(G*) the 
maximal number of & letter messages which can be sent without the danger of confusion. The 
limit @(G) = limg_,.. a(G")'/* is called the Shannon capacity of the graph. One has clearly 
Q(G) > a(G) because there are at least a(G)* words which can not be confused. The extreme 
cases is P,,, the graph with n vertices and no edges and K,,, the graph with n vertices and all 
edges present. In these cases O(P,,) =n and O(K,,) = 1. 


Theorem: The Shannon capacity of G = C;s is V5. 


The Shannon capacity was introduced by Claude Shannon in 1956 who wrote: The zero 
error capacity of a noisy channel is defined as the least upper bound of rates at which it is possible 
to transmit information with zero probability of error. Shannon took the logarithm and called 
Co = log(O(G)) = limh_... ¢ log(a(G*)) the zero-error capacity which reminds of a Lyapunov 


120 


OLIVER KNILL 


exponent measuring the exponential growth of a cocycle. Shannon computed the capacity for 
all graphs with n = 1,2,3,4,5 nodes and the pentagon had been the smallest, where he had 
been unable to determine the value, he only established 5 = 2.236.0 < @(G) < 5/2 = 2.5. 
This true value for the pentagon was then computed in [450], where also the notation O(G) 
appears. An exposition about Shannon capacity appears in [470] (Miniature 28 and 29). The 
problem of computing O(G) is formidable. One does not even know @(C7). 


239. OUTER BILLIARDS 


A convex curve C' in R? defines an area-preserving map T : X — X, where X is the unbounded 
region outside of the table. A point (x,y) is mapped into T(x, y) which is the point reflection 
at the point (p,q) which is the midpoint of the interval J obtained by intersecting the counter 
clockwise tangent from (x,y) to C. The map can be extended to C by defining T(z, y) = (x, y) 
there. For most points (x, y), the interval J is a single point but already for polygons, we want to 
have T' defined everywhere, even so it is not continuous. The map T is called the outer billiard 
map or dual billiard map defined by C. The Penrose polygon is the quadrilateral 
ABPQ defined by the 5-gon A, B,C, D, E, where P = (AD) N (BE) and Q = (BD) N (CD), 
with (AD), (BE), (BD), (CD) denoting diagonal segments in the pentagon. A table is called 
unstable if there exists (2, y) such that |T”(x, y)| is unbounded. 


Theorem: Outer billiard at the Penrose kite is unstable. 


The outer billiard 7’ is smooth if Cis a smooth and strictly convex. The dynamical system had 
been introduced by B.H. Neumann in the Manchester University Mathematics students journal 
of 1959 [513] and was popularized in [501] [502]. The question whether there exists a convex table 
for which an unbounded orbit of the map T’ exists is known as the Moser-Neumann question 
[592]. If C is smooth and strictly convex, then KAM theory establishes invariant curves 
for T and so stability of the table [183]. For a class of tables called quasi-rational polygons, 
which includes rational polygons and regular n-gons, all orbits are bounded [674] [201]. 
Also trapezoids lead to bounded orbits [446]. 


240. SANDWICH THEOREM 


Let G = (V, E) denote a finite simple graph. In information theory, where V is an alphabet of 
symbols, the graph is the confusion graph where connecting symbols which can be confused. 
A function f : V + Sn +1) is called an orthonormal representation if the orthogonality 
condition (f(u), f(v)) = 0 holds if the vertices are not adjacent. The Lovaz number is defined 
as 


6(G) = min maxyey (c,U(v))~*, 


where c is a unit vector and U is an orthonormal representation. This corresponds to minimizing 
the half-angle a of a rotational cone as 6(G) = 1/cos*(a), where c is the symmetry axes of the 
cone. The Lovasz number is multiplicative in the graph product because one can build for 
every power G" also an umbrella U”. Let a(G) be the independence number of G. It is the 
clique number c(G) of the graph complement G. Let x(G) denote the chromatic number, 
the minimal number of colors which one can use to color the graph. It is the clique covering 
number 3(G) of the graph complement. The following sandwich identity is the key to estimate 
the Shannon capacity 0(G) = lim,_,.. a(G")!/" [597]. 
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Theorem: c(G”) = a(G”) < @(G) < 0(G) < B(G) = x(@). 


The Lovasz number @(G) can be computed in polynomial time in the number of vertices. The 
Shannon capacity is sandwiched between the independence number of any power of G and the 
Lovasz number. See [407]. An example is a(C?) = 5 where (1,1), (2,3), (3, 5), (5, 4), (4, 2) is 
an independent set in the Shannon product G?, we have @(Cs) > V5. The Lovasz umbrella 
U = {uy, ug, U3, Ua, Us} with uz = [cos(t) sin(s), sin(t) sin(s), cos(s)] with cos(s) = 1/5{1/4),t = 
2nk/5 gives 0(Cs) < V5. Therefore, the Shannon capacity of the pentagon is V5. One does not 
know the Shannon capacity of the heptagon. In [450], where also the notation O(G) appears. 
An exposition about Shannon capacity appears in [470]. 


241. SHANNON CAPACITY THEOREM 


For a communication with bandwidth B, the signal to noise ratio S/N (also abbreviated 
SNR) has maximal capacity C. These quantities are related by 


Theorem: C' = Blog,(1+ S/N). 


This is also called the Shannon capacity theorem. The units are C’ as bits per second. The 
bandwidth is in given Herz, S is the average received signal power measured in Watts and N is 
the average power of the noise measured in Watts. The number log,(1 + S/N) is the spectral 
efficiency. 

In 1993, turbo codes appeared [103]. These were first practical codes to get to the Shannon 
limit. These codes were already patented by Claude Berrou in 1991. These codes are used in 
modern 3G, 4G mobile telephony standards. In 5G wireless communication other codes like 
Polar codes are used, which reach Shannon channel capacity [507]. 


242. DIFFERENTIAL GALOIS THEORY 


A differential ring is a field R equipped with a derivation D : R — R which is linear and 
satisfies the Leibniz rule D(fg) = D(f)g + fD(g). The field of fractions of an integral 
domain R (a ring R for which the product of two non-zero elements is not zero) is the smallest 
field containing R. A differential ring extension R < S has the ring R as a sub-ring and 
the derivation of S on restricted to R agreeing with the derivation on R. A differential ideal 
is an ideal J C R that is invariant under D. If defines the quotient ring R/J with derivation 
D(a+I) = D(a) +I. The ring of differential polynomials over FR is the polynomial ring 
R[Y\, Y2,...] with a countable set of variables in which D is extended as DY; = Yi41. If F isa 
differential field and K a field extension, then t € K is called elementary if it is generated by 
algebraic, a logarithm or an exponential functions. 


Theorem: f = e* can not be integrated in elementary terms. 


After differentiation, there would have to exist a function f with 1 = f’+2fa. [458]. For a 
book, [459] or the lectures [485]. 


243. NON-LINEAR SCHROEDINGER EQUATION 


The non-linear Schrédinger equation (NLSE) iu, = —Au+|u|?u is an example of a nonlin- 
ear partial differential equation for u(t, x) with x € R¢. It is an example of a classical field 
equation which can be used to describe Langmuir waves in hot plasmas or wave propagation 
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in fiber optics in which the non-linearity comes from self-phase modulation. It also appears to 
have relevance in understanding the formation of rogue waves in the ocean. The later are 
unexpectedly large waves that can endanger ships. In dimension d = 1, the differential equation 
is an example of an integrable system featuring non-linear phenomena like solitons. The 
L?-norm square of u is called the mass |u|, of u. It is preserved under the evolution. For any , 
the function ua(t, 7) = A?/Pu(X?#, Ax) is also a solution and its mass is M(u,) = \~4/PM(u). 
The mass subcritical case is p < 4/d. The Sobolev space H*(R7) is the space of functions 
f such that f as well as all its weak derivatives up to order k have finite L? norm. 


Theorem: Global solutions of a subcritical NLSE exist in H'(R‘). 


The problem is ill-posed in H?/¢-?/?(R*). The mass-critical case is when p = 4/d. In that 
case there is a minimal mass mo for solutions to blow up. See [647]. 


244. MENGER’S THEOREM 


Let G = (V, E) be a finite simple graph. For two disjoint subsets A, B, a minimal AB sep- 
arator is the minimal number of vertices disjoint from A,B which when removed disconnects 
A from B. A maximal AB-connector is the maximal number of pairwise disjoint paths 
connecting A with B. Let us denote by |MinimalAB — separators| the number of minimal 
AB-separators and similarly for the maximal AB-separators. The result is: 


Theorem: |Minimal AB-separators| = |maximal AB-connectors]. 


If A and B have an intersection, both numbers are just the cardinality of AN B as zero length 
paths {x} C ANB are considered connectors. An other special case is G is 2-connected with 
cut {x} and where AU {xz} UB is a disjoint union. Now, {x} is a minimal AB-separator. Since 
every path from A to B crosses 2, a maximal AB-connector consists of only one path. More 
generally, if G is k-connected meaning that we need to remove a vertex set X of cardinality k 
to make it disconnected, then if V = AUX UB is a disjoint union, the set X is a minimal AB 
separator and a maximal AB connector consist of |X| paths. The proof is done with respect to 
the number of edges in G. Menger proved this theorem in 1927 [483]. Menger did not use the 
language of graphs but proved it for curves which are compact connected topological spaces 
for which the boundary of arbitrary small neighborhoods is disconnected. He considered them 
as one-dimensional continua. Menger’s research was part of a program about dimension which 
works for general topological spaces independent of metric. The graph theoretical version is 
a special case as a geometric realization of the one-dimensional skeleton complex V U E of a 
graph G = (V, E) defines a curve in Menger’s sense. 


245. APERY’S THEOREM 


The Apéry constant ¢(3) = >, = is a special value of the Riemann zeta function. R. 
Apéry proved in 1979 that 


Theorem: The Apéry constant is irrational. 


While one knows that all ¢(2n) are irrational for n > 1 starting with ¢(2) = 77/6, ¢(4) = 74/90, 
the odd numbers are not yet known for 2n + 1 > 3. One does not know for example whether 
¢(5) is irrational. The problem of whether the Apéry constant is irrational is in the “most 
mysterious unsolved math problem". One only knows that infinitely many of the numbers 
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¢(2n + 1) are irrational. To the history: Euler, who gained fame with the computation of 
¢(2) = 2/6 already computed ¢(3) to several digits. The entire book is dedicated to 
Zeta-3. 


246. VIETORIS THEOREM 


. A topological space (X,O) is compact if every open cover (a subset F of O whose union is 
X) has a finite sub-cover (a finite subset of F whose union is X). If A € O then X \ A is called 
closed. A topological space is normal if any two closed sets A,B in X have disjoint open 
neighborhoods U,V. Non-normal topological spaces are relevant in mathematics: the Zariski 
topology on the spectrum of a ring for example is non-normal. The Vietoris theorem is 


Theorem: A compact topological space is normal. 


Leopold Vietoris proved this in 1921 and considered it his most important result even so he 
lived 110 years and wrote his last paper on trigonometric sums with 103 and more than half 
of his papers were written after his sixties birthday [558]. Normality is also called Tietze’s 
normality condition. In modern topology books the normality condition is called Axiom T 
but one has to be careful, as sometimes, it also assumes Hausdorff (any two points in X can be 
separated by open neighborhoods). Normality 7, does not imply Hausdorff T>: an example is 
the topological space X = ({a,b},O = {@, X}) which has only 0, X as closed sets and both sets 
are both open and closed. They also have disjoint open neighborhoods as they themselves are 
open neighborhoods ( they are both clopen sets). Indeed, any indiscrete topological space 
is normal. But if X contains at least two points, it is not Hausdorff. There are two points 
a,b that can not be separated by open neighborhoods. Any indiscrete topological space with 
at least two points is also an example of a compact non-Hausdorff space. The Theorem 
of Vietoris assures that the seemingly stronger condition of normality holds for all compact 
topological spaces while the Hausdorff property does not always hold. Vietoris is the father 
of modern convergence concepts like filter base or nets and modern notions of compactness. 
Normality is important because of the Tietze’s extension theorem stating that a continuous 
function on closed subset of a normal topological space can be extended to the entire space. 
The Tietze theorem was proven by Brouwer and Lebesgue for Euclidean spaces, extended by 
Tietze to metric spaces and by Urysohn for normal space. 


247. WHITNEY EXTENSION THEOREM 


. Given a C™ function on R", Taylor’s theorem assures that f(x) = Dyncm f(y) /Ri(a — 
ee a en Ri(x,y)(c—y)™/m! with R;,(x, y) > 0 uniformly as x,y — a. This gives relations 
fC“) = yo ki<m-—|r| f(y) (a — y)*/k! + R(x, y). A set F = {f,} of functions on R” with 
multi-index |k| < m satisfying these Taylor compatibility conditions is called a Taylor 
compatible set. A closed subset A of R” has the Whitney extension property if there is 
a function f € C™ such that f*(x) = f,(x) for 2 € A and such that f is real analytic on every 
point R” \ A. 


Theorem: A Taylor compatible F, A has the Whitney extension property. 


Hassler Whitney proved this result at Harvard in 1934 [691] just two years after getting his PhD 
there (remarkably in the completely different field of graph theory). He cites who proved 
a special case. Chapter 12 in gives an exposition of Whitney’s work on the theorem. To 
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cite from this book: Hass found it a real challenge to go beyond the first dimension. He drew 


picture after picture, but the problem seemed stubbornly intent on putting up a succession of 
frustrating barriers. His eventual success in 1933 was a real tour de force for the 26-year old. 


248. MARKOV’S INEQUALITY 


. Let X : Q > R a random variable on a probability space (0,.A,P) and let a > 0 be a real 
constant. In all what follows, using E[f(X)] for some function f assumes that this expectation 
is finite, meaning that f(X) € L1(Q,P). The Markov inequality is P||X| > a] < E[|X|]|/a. 
More generally, if f : [0,00) — [0,00o) is a strictly monotonically increasing function with 
f(0) =0 and a > 0, then 


Theorem: P(|X| >a] < E[f(|X|)I/ f(a). 


The proof is done by defining for all a > 0 a new random FOO) ba? =a If ee > i 
mann ae 0 else. Then 0 < f(A(x)) < f(X(x)) and Elf( = Jae “(2)) EPG) = 
Le gi dP(x) = f(a)P[X > a]. This gives P[X > a] < ia - )|/f(a). For example, if 


Sw 


ee = 7 oe the inequality to X = Y — E[Y] such that f(X) = Var[Y], one has the 
Chalysieg? s inequality P[|Y — E[Y]| >a] < ae . For f(a) = e* one gets the Chernoff 
inequality P[X > a] = Ple* > e%] < ele’, ihe this ny also for every f(x) = e® with 


t > 0. One has the Chernoff bound P[X > a] < infs5=5 | which is of interest as E[E“*] is 
the moment generating function of the random variable x 


249. MAGNUS FREIHEITSSATZ 


Let G = (X, R) be a finitely presented group with generators X = {2,...,2,} and relations 
R= {ri,...1q}. The group G is called a 1-relator group or Magnus group if g = 1 and if 
the relation r has the property that r is cyclically reduced and that all generators appear in 
r. |A word is called cylically reduced if every cylic permutation of the word is reduced. A word 
is called reduced if it does not contain subwords of the type rv~! or x~!x. For example, the 
word aba~' is not cyclically reduced. A cylic permutation can be reduced to 0. 


Theorem: Y Cc X,Y # X generates a free group in a Magnus group G. 


This is a result of Wilhelm Magnus of 1930. It means that given say the generators 71,..., U1, 
the only relations involving them are the trivial ones. It is also called the Independence 
Theorem. In the Freiheitssatz is considered a non-commutative analog of a similar 
result in a commutative algebraic structure. If V is a n-dimensional linear space over a field 
and W Cc V is a linear subspace given by a single equation )>, a;z; = 0, then W has dimension 
n — 1 meaning that W is a free Abelian subgroup of V. An other analogy in that overview 
paper is to compare it with a situation in algebraic geometry: an irreducible algebraic equation 
of n complex variables in which all n variables appear can not be used to derive any irreducible 
algebraic equation in which not all of these variables appear. The theory of finitely presented 
groups was initiated by Max Dehn in 1912. Magnus wrote his thesis in 1931 under the guidance 
of Dehn. Dehn had raised the word problem, the conjugacy problem and the isomorphism 
problem. Dehn also proposed that the Freiheitssatz could hold true. As Magnus pointed out, 
the Freiheitssatz assures that the 1-relator group has a positive word problem solution. In 
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1954, Novikov came up with the first finitely presented group with insoluble word problem. See 


221} [44] [43]. 


250. MARTIN’S AXIOM 


Let & be a cardinal, the Martin condition M(k) is the statement that if P is a partial order 
satisfying the countable chain condition and family D of dense sets in P with cardinality 
less or equal than k, there is a filter F on P such that F intersects every element in D. The 
Martin axiom states that M(k) holds for every cardinality smaller than 2°°. One has 


Theorem: In ZFC, M(No) holds but M(2°°) fails. 


The first statement is the Rasiowa-Sikorski lemma. The second statement is proven by an 
example: the set [0,1] with usual topology is separable and so satisfies the countable chain 
condition. An individual pint is nowhere dense but the union has 2°° points. Martin’s axiom 
was introduced in 1970 by Tony Martin and Robert Solovay [465]. The continuum hypothesis 
CH implies Martin’s axiom but it is also consistent with ZFC’ and the negation of CH. 


EPILOGUE: VALUE 


Which mathematical theorems are the most important ones? This is a complicated variational 
problem because it is a general and fundamental problem in economics to define “value". The 
difficulty with the concept is that “value" is often a matter of taste or fashion or social influence 
and so an equilibrium of a complex social system. Value can change rapidly, sometimes 
triggered by small things. The reason is that the notion of value like in game theory depends 
on how it is valued by others. A fundamental principle of catastrophe theory is that maxima 
of a functional can depend discontinuously on parameter. As value is often a social concept, 
this can be especially brutal or lead to unexpected viral effects. First of all, value is often 
linked to historical or morale considerations. We tend more and more to link artistic and 
scientific value also to the person. In mathematics, the work of Oswald Teichmiiller or Ludwig 
Bieberbach for example are linked to their political view and so devalued despite their brilliance 
[593]. This happens also outside of science, in art or in industry. The value of a company now 
also depends on what “investors think" or what analysts see for potential gain in the future. 
Social media try to measure value using “likes" or “number of followers". A majority vote is a 
measure but how well can it predict correctly what be valuable in the future? Majority votes 
taken over longer times would give a more reliable value functional. Assume one could persuade 
every mathematician to give a list of the two dozen most fundamental theorems and do that 
every couple of years, and reflect the “wisdom of an educated crowd", one could probably get a 
pretty good value functional. Ranking theorems and results in mathematics are a mathematical 
optimization problem by itself. One could use techniques known in the “search industry". One 
idea is to look at the finite graph in which the theorems are the nodes and where two theorems 
are related to each other if one can be deduced from the other (or alternatively connect them 
if one influences the other strongly). One can then run a page rank algorithm [438] to see 
which ones are important. Running this in each of the major mathematical fields could give an 
algorithm to determine which theorems deserve the name “fundamental". Now, there was also 
a problem with publishing the page rank as people tried to manipulate it using search engine 
optimization tricks. Google now does no more give the page rank of a website, simply to avoid 
such manipulations. The story illustrates that reflecting about algorithms that measure value 
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can influence the algorithm itself and even destroy it. Similarly as in quantum mechanics, the 
measurement process can influence the experiment to the point that it is no more reliable. 


OPINIONS 


It had been a course “Math from a historical perspective" taught a couple of times at the Har- 
vard extension school has motivated to write up the present document. As part of a project 
it was often asked to to write about some theorems or mathematical fields or a mathematical 
person and try to rank it. The present document benefits from these writings as it is interesting 
to see what others consider important. Sometimes, seeing different opinions can change your 
own view. I was definitely influenced by students, teachers, colleagues and literature as well of 
course by the limitations of my own understanding. My own point of view has already changed 
while writing the actual theorems down and will certainly change more. Value is more like an 
equilibrium of many different factors. In mathematics, values have changed rapidly over time. 
And mathematics can describe the rate of change of value [549]. Major changes in the appre- 
ciation for mathematical topics came throughout the history. Sometimes with dramatic shifts 
like when mathematical notations started to appear, at the time of Euclid, then at the time 
when calculus was developed by Newton and Leibniz. Also the development of more abstract 
algebraic constructs or topological notions, like for example the start of set theory changed 
things considerably. In more modern times, the categorization of mathematics and the 
development of rather general and abstract new objects, (for example with new approaches 
taken by Grothendieck) changed the landscape. In most of the new development, I remain the 
puzzled tourist wondering how large the world of mathematics is. It has become so large that 
continents have emerged: we have applied mathematics, mathematical physics, statis- 
tics, computer science and economics which have drifted away to independent subjects and 
departments. Classical mathematicians like Euler would now be called applied mathematicians, 
de Moivre would maybe be stamped as a statistician, Newton a mathematical physicist and 
Turing a computer scientist and von Neuman an economist or physicist. 


SEARCH 


A couple of months before starting this document in 2018, when looking online for “George 
Green", the first hit in a search engine would be a 22 year old soccer player. (This was not 
a search bubble thing [532] as it was tested with cleared browser cache and via anonymous 
VPN from other locations, where the search engine can not determine the identity of the user). 
Now, I love soccer, played it myself a lot as a kid and also like to watch it on screen, but 
is the English soccer player George William Athelston Green really more “relevant" than the 
British mathematician George Green, who made fundamental break through discoveries which 
are used in mathematics and physics? Shortly after I had tweeted about this strange ranking 
on December 27, 2017, the page rank algorithm must have been adapted, because already on 
January 4th, 2018, the Mathematician George Green appeared first (again not a search bubble 
phenomenon, where the search engine adapts to the users taste and adjusts the search to 
their preferences). It is not impossible that my tweet has reached, meandering through social 
media, some search engine engineer who was able to rectify the injustice done to the miller 
and mathematician George Green. The theory of networks shows “small world phenomena" 
can explain that such influences or synchronizations are not that impossible 
637|. But coincidences can also be deceiving. Humans just tend to observe coincidences even 
so there might be a perfectly mathematical explanations. This is prototyped by the birthday 
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paradox [475]. But one must also understand that search needs to serve the majority. For a 
general public, a particular subject like mathematics is not that important. When searching 
for “Hardy" for example, it is not Godfrey Hardy who is mentioned first as a person belonging 
to that keyword but Tom Hardy, an English actor. This obviously serves most of the searches 
better. As this might infuriate particular groups (here mathematicians), search engines have 
started to adapt the searches to the user, giving the search some context which is an important 
ingredient in artificial intelligence. The problem is the search bubble phenomenon which runs 
hard against objectivity. Textbooks of the future might adapt their language, difficulty and 
even their citations or the historical credit on who reads it. Novels might adapt the language to 
the age of the user, the country where the user lives, and the ending might depend on personal 
preferences or even the medical history of the user (the medical history of course being accessible 
by the book seller via ‘big data" analysis of user behavior and tracking which is not SciFi this is 
already happening): even classical books are cleansed for political correctness, many computer 
games are already customizable to the taste of the user. A person flagged as sensitive or a 
young child might be served a happy ending in a novel rather than a conclusion of the novel in 
an ambivalent limbo or even a disaster. explains the difficulty. The issues have amplified 
even more in more recent times. The phenomenon of filter bubble even influences elections and 
polarizes opinions as one does not even hear any more alternate arguments. 


BEAUTY 


In order to determine what is a “fundamental theorem", also aesthetic values matter. But the 
question of “what is beautiful" is even trickier. Many have tried to define and investigate the 
mechanisms of beauty: [297] [686] [6877] [5770] 9} [497]. In the context of mathematical formu- 
las, the question has been investigated within the field of neuro-aesthetics. Psychologists, in 
collaboration with mathematicians have measured the brain activity of 16 mathematicians with 
the goal to determine what they consider beautiful [579]. The Euler identity e’” + 1 = 0 was 
rated high with a value 0.8667 while a formula for 1/7 due to Ramanujan was rated low with an 
average rating of -9.7333. Obviously, what mattered was not only the complexity of the formula 
but also how much insight the participants got when looking at the equation. The authors 
of that paper cite Plato who wrote once "nothing without understanding would ever be more 
beauteous than with understanding". Obviously, the formula of Ramanujan is much deeper but 
it requires some background knowledge for being appreciated. But the authors acknowledge 
in the discussion that that correlating “beauty and understanding" can be tricky. Rota [570] 
notes that the appreciation of mathematical beauty in some statement requires the ability to 
understand it. And [497] notices that “even professional mathematicians specialized in a certain 
field might find results or proofs in other fields obscure" but that this is not much different 
from say music, where “knowledge about technical details such as the differences between things 
like cadences, progressions or chords changes the way we appreciate music" and that “the sym- 
metry of a fugue or a sonata are simply invisible without a certain technical knowledge". As 
history has shown, there were also always “artistic connections" [239] [95] as well as “religious 
influences" [449] [611]. The book [239] cites Einstein who defines “mathematics as the poetry 
of logical ideas". It also provides many examples and illustrations and quotations. And there 
are various opinions. Rota argues that beauty is a rather objective property which depends 
on historic-social contexts. And then there is taste: what is more appealing, the element of 
surprise like the Birthday paradox or Petersburg paradox in probability theory, the Banach- 
Tarski paradox in measure theory which obviously does not trigger any enlightenment nor 
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understanding if one hears the first time: one can disassemble a sphere into 5 pieces, rotate 
and translate these pieces in space to build up two spheres. Or the surprising fact that the 
infinite sum 1+2+3+4+5+... is naturally equal to —1/12 as it is ¢(—1) (which is a value 
defined by analytic continuation and can hardly be understood without training in complex 
analysis). The role of aesthetic in mathematics is especially important in education, where 
mathematical models [224], mathematical visualization [41], artistic enrichment [225], surfaces 
[424], or 3D printing [594] {404] can help to make mathematics more approachable. Update 2019: 
as reported in Science Daily a study of the university of Bath concludes that people appreciate 
beauty in complex mathematics [355]. The results which had been chosen in that study had 
been rather simple however: the infinite geometric series formula, the Gauss’s summation trick 
for positive integers, the Pigeonhole principle, and a geometric proof of a Faulhaber formula 
for the sum the first powers of an integer. When judging the mathematics describing physical 
models, Paul Dirac was probably the most outspoken advocate for beauty. He stated in [175] 
for example: It seems to be one of the fundamental features of nature that fundamental physical 
laws are described in terms of a mathematical theory of great beauty and power, needing quite 
a high standard of mathematics for one to understand tt. 


DEEPNESS 


A taxonomy is a way to place objects like theorems in an multi-dimensional cube of numerical 
attributes. Besides the ugly-beauty parameter, one can think of all kind of taxonomies 
to classify theorems. There is the simplicity-complexity axes, which could be measured 
by the number of mathematicians who can understand the proof, the boring-interesting 
axes which measures the entertainment value or potential for pop culture appearances, the 
useless-applicable axes which measures how many applications the theorem has in engineer- 
ing, economics or other sciences, the easy-hard which could be measured in the amount of 
time one needs to understand the proof. And then there is the shallow-deepness axes, which 
is even more subjective but which could be quantified too. One could look for example, how 
long a proof path is from basic axioms to the theorem and weight each path with how many 
other interesting theorems have been visited along. Also of benefit are how many different 
areas of mathematics have been visited along the proof. A deep theorem could be obtained 
by proving it with different long paths, each reaching other already established deep results. 
One can now argue how to average all these paths, whether one should take the minimum or 
maximal deep proof path. The later point was addressed in [437]. 


Maybe unlike with other parameters, the antipode “trivial" of “deepness" has a positive side 
too: it is maybe not “shallow" but what we call “fundamental". Fundamental theorems are 
not necessarily deep. The Pythagorean theorem for example or Zorn’s lemma are not deep 
but they are fundamental. Basic logical identities based on Boolean algebra which are used in 
almost every proof step are of fundamental importance but not deep. One could still go back 
and measure how fundamental something is by how many deep theorems can be proven with 
it. 


[665] points out that the adjective “deep" is used for all kind of mathematical objects: theorems, 
proofs, problems, insights or concepts can be described as deep and that often the theorem is 
called deep if its proof is deep. Urquhart points out however that “if a simple proof is discovered 
later, perhaps the result might be reclassified as not deep at all" and that so, the difficulty of 
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the concept “mathematical depth" is not so well defined. The author then mentions the graph 
minor theorem (in every infinite set of graphs there are two for which one is the minor of the 
other), which Diestel calls “one of the deepest theorems that mathematics has to offer. 
Some justification for the deepness of the result is that it has made impact also outside graph 
theory and that its proof takes well over 500 pages. 


665] also collects opinions of philosophers and mathematics about deepness. Cited is for exam- 

ple as Hardy gives an extended discussion on depth and sees mathematical ideas “arranged 
somehow in strata, each stratum being linked by a complex relation both among themselves 
and with those above and below, the lower the stratum, the deeper the idea. Also cited is 
the book of Penelope Maddy which expresses doubt that that mathematical depth really 
can be accounted for productively because it is a “catch-all" for the various kinds of virtues 
and often used as a term of approbation, but always in an informal context without giving 
a precise meaning. Also cited in |665] are present day mathematicians like Gowers [262] who 
links “deep" with “hard" and contrasts it with “obvious". If a proof requires a non-obvious idea, 
then it is considered deep. Also cited is a later statement of Gowers telling that “The normal 
use of the word ‘deep’ is something like this: a theorem is deep if it depends on a long chain of 
ideas, each involving a significant insight". Finally mentioned is Tao [650] who lists over twenty 
meanings to “good mathematics": (be a breakthrough for solving a problem, masterfully using 
technique, building theory, having insight, discovering something unexpected, having applica- 
tion, clear exposition, good pedagogy enabling understanding, long-range vision, good taste, 
public relations, advancing foundations, rigorous, beautiful, elegant, creative, useful, sharp to 
known counterexamples, intuitive and visualisable, being definitive like a classification result 
and finally deep which Tao defines as “manifestly non-trivial, for instance by capturing a subtle 
phenomenon beyond the reach of more elementary tools".) [665] also illustrates the concept of 
deepness with moves one sees in chess: a combination of moves which are not obvious and have 
an element of surprise like in the Byrne-Fischer game of 1963-1964. 


In a talk “Mathematical Depth Workshop" of April 11,12, 2014 John Stillwell gave the follow- 
ing examples of deep theorems: Dirichlet’s theorem on primes in an arithmetic progression, 
Perelman’s theorem on Poincaré’s conjecture, Fermat’s last theorem and then the classification 
of finite simple groups. A deep theorem should be difficult, surprising, important, fruitful, 
elegant and fundamental. As less deep but accessible, he gives the independence of the parallel 
postulate, the fundamental theorem of algebra, the existence of division algebras, the Riemann 
integrability of continuous functions, the uncountability of IR. Robert Geroch told in that same 
workshop that deep theorems should be detached from connections with people, or then have 
connections with physics: examples are representations of the Lie group SL(2,C), the TCP 
theorem or the appearance of symmetric hyperbolic partial differential equations. Jeremy 
Gray stressed then the importance of multiple proofs, to give more reasoning, show different 
methodologies, see new routes or produce more purity. He said that the difference between 
deep and difficult is that deep things should be more hidden. Deep according to Gauss has to 
be “difficult". The result may be elegant or beautiful, but the proof needs to be difficult. Marc 
Lange argues to assign the attribute deep to the proof of a theorem and not the theorem 
itself. The reason is that there could be multiple proofs, where one proof is deeper than the 
other. This could mean for example that a theorem which is considered deep, remains to have 
a deep proof even in the case if it turns out to be provable in a very simple and dull way. 
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THE FATE OF FAME 


Aesthetics is a fragile subject. If something beautiful has become too popular and so entered 
pop-culture, a natural aversion against it can develop. The feeling is justified that popular 
things are often frivolous. It is also in danger to become a clishé or even become kitsch 
(which is a word used to tear down popular stuff or to label poor taste). The Mandelbrot set 
for example is just marvelous, but it does hardly does excite anymore because it is so commonly 
known. The Monty-Hall problem which became famous by Gardner columns in the early 
1990’ies (see [569]) was cool to teach in 1994, three years after the infamous “parade 
column" of 1991 by Marilyn vos Savant which blew it into the spot light. But especially after 
a cameo in the movie “21", the theorem has become part of mathematical kitsch. I myself 
love mathematical kitsch. A topic that gained that status must have been nice and innovative 
to obtain that label. Kitsch becomes only tiresome however if it is not presented in a new and 
original form. The book [533], in the context of complex dynamics, remains a master piece still 
today, even-so the picture have become only too familiar, but rendering the Mandelbrot set 
today in that same way hardly does the rock the boat any more. Still, it remains fascinating 
and more and youtube allows to see sophisticated zooms down to the size of 10~?°°. In that 
context, it appears strange that mathematicians do not jump on the “Mandelbulb set" M, a 
three dimensional version of the Mandelbrot set which is one of the most beautiful mathematical 
objects. The reason could be that as a “youtube star" it is not worthy yet any serious academic 
consideration; more likely however is that the object is just too difficult for a serious study, 
as we lack the mathematical analytic tools which for example would just to answer a basic 
question like whether M is connected. A second example is catastrophe theory [549] [668] 
a beautiful part of singularity theory which started with Hassler Whitney and was then 
developed by René Thom [657]. It was hyped to much that it fell into a deep fall from which it 
has not yet fully recovered. This happened despite the fact that Thom himself already pointed 
out the limits, as well as the controversies of the theoryy [81]. It had to pay a prize for its fame 
and appears to be forgotten. Chaos theory from the 60ies which started to peak with Edward 
Lorenz and terms like the “Butterfly effect" “strange attractors" started to become a clishé 
latest after that infamous scene featuring the character Ian Malcolm in the 1993 movie Jurassic 
park. It was laughed at already within the same movie franchise, when in the third Jurassic 
Park installment of 2001, the kid Erik Kirby snuffs on Malcolm’s “preachiness" and quotes his 
statement “everything is chaos" in a condescending way. In art, architecture, music, fashion or 
design also, if something has become too popular, it is despised by the “connaisseurs". Hardly 
anybody would consider a “lava lamp" (invented in 1963) a object of taste nowadays, even so, 
the fluid dynamics and motion is objectively rich and interesting, illustrating also geometric 
deformation techniques in geometry like the Ricci flow. The piano piece “Fiir Elise" by Ludwig 
van Beethoven became so popular that it can not even be played any more as background music 
in a supermarket. There is something which prevents a “serious music critic" to admit that 
the piece is great, genius due to its simplicity. Such examples suggest that it might be better 
for an achievement (or theorem in mathematics) not to enter pop-culture as this indicates a 
lack of “deepness" and is therefore despised by the elite. The principle of having fame torn 
down to disgrace is common also outside of mathematics. Famous actors, entrepreneurs or 
politicians are not universally admired but sometimes hated to the guts, or torn to pieces and 
certainly can hardly live normal lives any more. The phenomenon of accumulated critique 
got amplified with mob type phenomena in social media. There must be something fulfilling 
to trash achievements, the simplest explanation being envy. Film critics are often harsh and 
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judge negatively because this elevates their own status as they appear to have a “high standard". 
Similarly morale judgement is expressed often just to elevate the status of the judge even so 
experience has shown that often judges are offenders themselves and the critique turns out to 
be a compensation. Maybe it is also human “Schadenfreude", or greed which makes so many 
to voice critique. History has shown however that social value systems do not matter much 
in the long term. A good and rich theory will show its true value if it is appreciated also in 
hundreds of years, where fashion and social influence have no more any impact. The theorem 
of Pythagoras will be important independent of fame and even if it has become a cliché, it is 
too important to be labeled as such. It has not only earned the status of kitsch, it is also a 
prototype as well as a useful tool. 


MEDIA 


There is no question that the Pythagorean theorem, the Euler polyhedron formula 
yx =v—e+f the Euler identity e’” + 1 = 0, or the Basel problem formula 1 +1/4+1/9+ 
1/16+--- = 7/6 will always rank highly in any list of beautiful formulas. Most mathematicians 
agree that they are elegant and beautiful. These results will also in the future keep top spots in 
any ranking. On social networks, one can find lists of favorite formulas. On “Quora", one can 
find the arithmetic mean-geometric mean inequality Vab < (a + b)/2 or the geometric sum- 
mation formula 1+a+a?+---=1/(1—a) high up. One can also find strange contributions 
in social media like the identity 1 = 0.99999... which is used by Piaget inspired educators 
to probe mathematical maturity of kids. Similarly as in Piaget’s experiments, there is time of 
mathematical maturity where a student starts to understand that this is an identity. A very 
young student thinks 1 is larger than 0.9999... even if told to point out a number in between. 
Such threshold moments can be crucial for example to mathematical success later. We have a 
strange fascination with “wunderkinds", kids for which some mathematical abilities have come 
earlier (even so the existence of each wonder kid produces a devastating collateral damage in 
its neighborhood as their success sucks out any motivation of immediate peers). The problem 
is also that if somebody does not pass these Piaget thresholds early, teachers and parents con- 
sider them lost, they get discouraged and become uninterested in math (the situation in other 
art or sport is similar). In reality, slow learners for which the thresholds are passed later are 
often deeper thinkers and can produce deeper or more extraordinary results. At the moment, 
searching for the “most beautiful formula in mathematics" gives the Euler identity and search 
engines agree. But the concept of taste in a time of social media can be confusing. We live in 
an epoch, where a 17 year old “social influencer" can in a few days gather more “followers" and 
become more widely known than Sophie Kovalewskaya who made fundamental beautiful 
and lasting contributions in mathematics and physics like the Cauchy-Kovalevskaya theorem. 
Such a theorem is definitely more lasting than a few “selfie shots" of a pretty face, but mea- 
sured by a “majority vote", it would not only lose, it would completely disappear. One can find 
youtube videos of kids explaining the 4th dimension, which are watched millions of times, many 
thousand times more than videos of mathematicians who have created deep mathematical new 
insight about four dimensional space. But time rectifies. Kovalewskaya will also be ranked 
highly in 50 years, while the pretty face has faded. Hardy put this even more extremely by 
comparing a mathematician with a literary heavy weight: Archimedes will be remembered when 
Aeschylus is forgotten, because languages die and mathematical ideas do not. There is no 
doubt that film and TV (and now internet like “Youtube", social networks and “blogs") has a 
great short-term influence on value or exposure of a mathematical field. Examples of movies 
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with influence are It is my turn (1980), or Antonia’s line (1995) featuring some algebraic 
topology, Good will hunting (1997) in which some graph theory and Fourier theory appears, 
21 from (2008) which has a scene in which the Monty Hall problem has a cameo. The man 
who knew infinity displays the work of Ramanujan and promotes some combinatorics like 
the theory of partitions. There are lots of movies featuring cryptology like Sneakers (1992), 
Breaking the code (1996), Enigma (2001) or The imitation game (2014). For TV, math- 
ematics was promoted nicely in Numb3rs (2005-2010). For more, see or my own online 
math in movies collection. 


PROFESSIONAL OPINIONS 


Interviews with professional mathematicians can also probe the waters. In [416], Natasha Kon- 
dratieva has asked a number of mathematicians: “What three mathematical formulas are the 
most beautiful to you". The formulas of Euler or the Pythagoras theorem naturally were 
ranked high. Interestingly, Michael Atiyah included even a formula "Beauty = Simplicity + 
Depth". Also other results, like the Leibniz series 7/4 = 1—1/3+1/5-—1/7+1/9-..., the 
Maxwell equations dF = 0,d*F = J or the Schrédinger equation ihu’ = (ihV + eA)?u+ 
Vu, the Einstein formula E = mc’ or the Euler’s golden key )7~, 1/n* = [],,(1—1/p*)* 
or the Gauss identity ie ce? dx = x or the volume of the unit ball in R?” given as 
mt” /n! appeared. Gregory Margulis mentioned an application of the Poisson summation for- 
mula >>, f(n) = do, f(n) which is ¥2>>, —— Xe e-”’/4 or the quadratic reciprocity 
law (p|q) = (—1)®-V?-Y/?, where (p\q) = 1 if q is a quadratic residue modulo p and —1 
else. Robert Minlos gave the Gibbs formula, a Feynman-Kac formula or the Stirling 
formula. Yakov Sinai mentioned the Gelfand-Naimark realization of an Abelian C* alge- 
bra as an algebra of continuous function or the second law of thermodynamics. Anatoly 
Vershik gave the generating function [];2,(1+2*) = °°, p(n)2” for the partition function 
p(n) and the generalized Cauchy inequality between arithmetic and geometric mean. An 
interesting statement of David Ruelle appears in that article who quoted Grothendieck by “ 
my life’s ambition as a mathematician, or rather my joy and passion, have constantly been 
to discover obvious things ...". Combining Grothendieck’s and Atiyah’s quote, fundamental 
theorems should be “obvious, beautiful, simple and still deep". 

A recent column “Roots of unity" in the Scientific American asks mathematicians for their fa- 
vorite theorem: examples are Noether’s theorem, the uniformization theorem, the Ham 
Sandwich theorem, the fundamental theorem of calculus, the circumference of the 
circle, the classification of compact 2-surfaces, Fermat’s little theorem, the Gromov 
non-squeezing theorem, a theorem about Betti numbers, the Pythagorean theorem, the 
classification of Platonic solids, the Birkhoff ergodic theorem, the Burnside lemma, 
the Gauss-Bonnet theorem, Conways rational tangle theorem, Varignon’s theorem, an 
upper bound on Reidemeister moves in knot theory, the asymptotic number of rela- 
tive prime pairs, the Mittag Leffler theorem, a theorem about spectral sparsifiers, the 
Yoneda lemma and the Brouwer fixed point theorem. These interviews illustrate also 
that the choices are different if asked for “personal favorite theorem" or “objectively favorite 
theorem". 
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FUNDAMENTAL VERSUS IMPORTANT 


Asking for fundamental theorems is different than asking for “deep theorems" or “important 
theorems". Examples of deep theorems are the Atiyah-Singer or Atiyah-Bott theorems in 
differential topology, the KAM theorem related to the strong implicit function theorem, or 
the Nash embedding theorem in Riemannian geometry. An other example is the Gauss- 
Bonnet-Chern theorem in Riemannian geometry or the Pesin theorem in partially hy- 
perbolic dynamical systems. Maybe the shadowing lemma in hyperbolic dynamics is more 
fundamental than the much deeper Pesin theorem (which is still too complex to be proven 
with full details in any classroom. Also excellent textbooks like do not prove the 
full theorem establishing the Bernoulli property on ergodic components). One can also argue, 
whether the “theorema egregium" of Gauss, stating that the curvature of a surface is intrinsic 
and not dependent on an embedding is more “fundamental" than the “Gauss-Bonnet" result, 
which is definitely deeper. In number theory, one can argue that the quadratic reciprocity 
formula is deeper than the little Theorem of Fermat or the Wilson theorem. (The later 
gives an if and only criterion for primality but still is far less important than the little theorem 
of Fermat which as the later is used in many applications.) The last theorem of Fermat 
is an example of an important theorem as it is deep and related to other fields and culture, 
but it is not yet so much a “fundamental theorem". Similarly, the Perelman theorem fixing 
the Poincaré conjecture is important, but it is not (yet) a fundamental theorem. It is still a 
mountain peak and not a sediment in a rock. Important theorems are not much used by other 
theorems as they are located at the end of a development. Also the solution to the Kepler 
problem on sphere packings or the proof of the 4-color theorem [120] or the proof of the 
Feigenbaum conjectures [159] [344] are important results but not so much used by other 
results. Important theorems build the roof of the building, while fundamental theorems form 
the foundation on which a building can be constructed. But this can depend on time as what 
is the roof today, might be in the foundation later on, once more floors have been added. 


ESSENTIAL MATH 


In education it is necessary regularly to reexamine what a student of mathematics needs to 
know. What are essential fields in mathematics? Also here, there are many opinions and 
things are always in the flux. The 7 liberal arts of sciences was an early attempt to organize 
things in a larger scale. For example, while in the 19th century, quaternions were considered 
essential, they fell out of the curriculum and today, it is well possible that a student learns 
about division algebras only in graduate school. One of the questions is how to balance appli- 
cability and elegance. In pure mathematics, one might more focus on beauty and elegance, in 
applied mathematics, the applicability is important. As the field of mathematics has expanded 
enormously, there is the problem of fragmentation. On the other hand, the mathematical fields 
have also split. Some domains have been “taken over" by new departments like applied mathe- 
matics, computer science or statistics. Discrete mathematics courses like graph theory or theory 
of computation or cryptology are now in the hands of computer science, differential equations 
or numerical analysis by applied mathematics departments, probability theory courses taught 
by statistics departments. Still, there is a core of mathematical content which a mathemati- 
cian should at least have been exposed to. A student studying the subject should probably 
have an eye on both getting into a field which looks promising for research as well as having a 
broad general education in all possible fields. One can get an idea what is required in various 
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mathematics departments by looking at what are called “general examinations" or “qualifying 
examinations". These are exams given to first year graduate students which have to be passed. 
Departments like Harvard or Princeton have many of these questions in the public. 
Also here, one could go to the AMS classification and grind through all topics. Instead, let us 
try an attempt to put it all in one box, being aware that other priorities can work too: 


Pre-calculus Algebra, Trig functions,Log and Exp Functions, Graphs, Modeling, Geometry, Solving equations, Inequalities 
Single variable Functions, Limits, Continuity, Differentiation, Integration, Series, Differential equations, Fundamental theorem 
Multi variable Vectors, Geometry, Functions, Differentiation, Integration, Vector calculus: Green Stokes and Gauss 

Linear algebra Linear equations, Determinants, Eigenvalues, Projection and Data-fitting, Differential equations, Fourier theory 
Dynamical systems | Iteration of maps, Ordinary and partial differential equations, Bifurcation theory, Integrability, Ergodic Theory 
Probability Probability spaces, Random variables, Distributions, Stochastic Processes, Statistics, Data, Estimation 
Discrete math Combinatorics, Graphs, Order structures, Counting tools, Theory of computation, Complexity, Game Theory 
Numerics Algorithms,Integration, Solving ODE’s, PDE’s, Approximation techniques, Interpolation, Comput. Geometry 
Analysis Functional analysis, Banach algebras, Complex analysis, Harmonic analysis, Fourier theory, Laplace, PDE’s 
Algebra Groups and Rings, Modules, Vector Spaces, Commutative algebra, Non-commutative Rings, Galois theory 
Number theory Primes, Diophantine equations and approximations, Geometry of numbers, Dirichlet Series, Zeta function 
Geometry Differential topology, Differential Geometry, Geodesics, Curvature, Invariants, Geometric Measure theory 

Alg. Geometry Affine and Projective varieties, Ringed spaces, Schemes, Sheaf Theoretical Methods, Cohomology, Categories 
Topology Set theoretical topology, Fractal Geometry, Differential topology, Homotopy, Algebraic Topology, Topos theory 
Logic First/second order Logic, Foundations, Models, Incompleteness, Forcing, Computability, New Axiom systems 
Real analysis Foundations, Metric spaces, Measure theory, Theory of integration on delta rings, Non-standard analysis 
Computer Science Math software, Programming Paradigms, Computer Architecture, Data structures, Big Data, Machine Learning 
Connections History, Big picture, Number systems, Notation, Linguistic, Psychology, Philosophy, Sociology and Pedagogy 


OPEN PROBLEMS 


The importance of a result is also related to open problems attached to the theorem. Open 
problems fuel new research and new concepts. Of course this is a moving target but any “value 
functional" for “fundamental theorems" is time dependent and a bit also a matter of fash- 
ion, entertainment (TV series like “Numbers" or Hollywood movies like “good will hunting" 
changed the value) and under the influence of big shot mathematicians which serve as “in- 
fluencers". Some of the problems have prizes attached like the 23 problems of Hilbert, the 
15 problems of Simon [603], the 18 problems of Smale, the Yau problems in geometry 
[708], the 10 Millenium problems or the four Landau problems (Goldbach conjecture, 
twin prime conjecture, the existence of primes between consecutive primes and the existence 
of infinitely many primes of the form n? +1) and then the oldest problem of mathematics 
the existence of odd perfect numbers. 

There are beautiful open problems in any major field and building a ranking would be as difficult 
as the problem to rank theorems. It is a bit a personal matter. I like the odd perfect number 
problem because it is the oldest problem in mathematics. Also Landau’s list of 4 problems 
are clearly on the top. They are shockingly short and elementary but brutally hard, having 
resisted more than a century of attacks by the best minds. There are other problems, where 
one believes that the mathematics has just not been developed yet to tackle it, an example 
being the Collatz (3k+1) problem. With respect to the Millenium problems, one could argue 
that the Yang-Mills gap problem is a rather vague. The problem looks like “made by humans" 
while a problem like the odd perfect number problem has been “made by the gods". 

There appears to be wide consensus that the Riemann hypothesis is the most important open 
problem in mathematics. It states that the roots of the Riemann zeta function are all located 
on the axes Re(z) = 1/2. In number theory, the prime twin problem or the Goldbach 
problem have a high exposure because they can be explained to a general audience without 
mathematics background. For some reason, an equally simple problem, the Landau problem 
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asking whether there are infinitely many primes of the form n? +1 is much less well known. In 
recent years, due to an alleged proof by Shinichi Mochizuki of the ABC conjecture using a new 
theory called Inter-Universal Teichmiiller Theory (IUT) which so far is not accepted by the 
main mathematical community despite strong efforts. But it has put the ABC conjecture from 
1985 in the spot light like [701]. It has been described in [254] as the most important problem 
in Diophantine equations. It can be expressed using the quality Q(a,b,c) of three integers 
a, b,c which is Q(a, b, c) = log(c)/log(rad(abc)), where the radical rad(n) of a number n is the 
product of the distinct prime factors of n. The ABC conjecture is that for any real number q > 1 
there exist only finitely many triples (a, b,c) of positive relatively prime integers with a+b=c 
for which Q(a, b,c) > q. The triple with the highest quality so far is (a,b,c) = (2, 310109, 23°); 
its quality is Q = 1.6299. And then there are entire collections of conjectures, one being the 
Langlands program which relates different parts of mathematics like number theory, algebra, 
representation theory or algebraic geometry. I myself can not appreciate this program yet 
because I need first to understand it. My personal favorite problem is the entropy problem 
in smooth dynamical systems theory [367]. The Kolmogorov-Sinai entropy of a smooth 
dynamical system can be described using Lyapunov exponents. For many systems like smooth 
convex billiards, one measures positive entropy but is unable to prove it. An example is the 


real analytic /* table z+ + y* = 1 [852]. For ergodic theory, see 162} (231) [608}. 


CLASSIFICATION RESULTS 


One can also see classification theorems like the above mentioned Gelfand-Naimark realization 
as mountain peaks in the landscape of mathematics. Examples of classification results are 
the classification of regular or semi-regular polytopes, the classification of discrete subgroups of 
a Lie group, the classification of “Lie algebras", the classification of “von Neumann algebras", 
the “classification of finite simple groups", the classification of Abelian groups, or the 
classification of associative division algebras which by Frobenius is given either by the real or 
complex or quaternion numbers. Not only in algebra, also in differential topology, one would 
like to have classifications like the classification of d-dimensional manifolds. In topology, an 
example result is that every Polish space is homeomorphic to some subspace of the Hilbert 
cube. Related to physics is the question what “functionals" are important. Uniqueness results 
help to render a functional important and fundamental. The classification of valuations of 
fields is classified by Ostrowski’s theorem classifying valuations over the rational numbers 
either being the absolute value or the p-adic norm. The Euler characteristic for example 
can be characterized as the unique valuation on simplicial complexes which assumes the value 
1 on simplices or functional which is invariant under Barycentric refinements. A theorem of 
Claude Shannon [598] identifies the Shannon entropy is the unique functional on probability 
spaces being compatible with additive and multiplicative operations on probability spaces and 
satisfying some normalization condition. 


BOUNDS AND INEQUALITIES 


An other class of important theorems are best bounds like the Hurwitz estimate stating that 
there are infinitely many p/q for which |x — p/q| < 1/(W5q?). In packing problems, one wants 
to find the best packing density, like for sphere packing problems. In complex analysis, one 
has the maximum principle, which assures that a harmonic function f can not have a local 
maximum in its domain of definition. One can argue for including this as a fundamental theorem 
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as it is used by other theorems like the Schwarz lemma (named after Hermann Amandus 
Schwarz) from complex analysis which is used in many places. In probability theory or statistical 
mechanics, one often has thresholds, where some phase transition appears. Computing these 
values is often important. The concept of maximizing entropy explains many things like 
why the Gaussian distribution is fundamental as it maximizes entropy. Measures maximizing 
entropy are often special and often equilibrium measures. This is a central topic in statistical 
mechanics [574]. In combinatorial topology, the upper bound theorem was a milestone. 
It was long a conjecture of Peter McMullen and then proven by Richard Stanley that cyclic 
polytopes maximize the volume in the class of polytopes with a given number of vertices. 
Fundamental area also some inequalities like the Cauchy-Schwarz inequality |a-b| < 
|a||b|, the Chebyshev inequality P{|X — [E[X]| > |a|] < Var[X]/a?. In complex analysis, the 
Hadamard three circle theorem is important as gives bounds between the maximum of 
|f| for a holomorphic function f defined on an annulus given by two concentric circles. Often 
inequalities are more fundamental and powerful than equalities because they are more widely 
used. Related to inequalities are embedding theorems like Sobolev embedding theorems. 
For more inequalities, see [97]. Apropos embedding, there are the important Whitney or Nash 
embedding theorems which are appealing. 


BIG IDEAS 


Classifying and valuing big ideas is even more difficult than ranking individual theorems. Ex- 
amples of big ideas are the idea of axiomatisation which stated with planar geometry and 
number theory as described by Euclid and the concept of proof or later the concept of mod- 
els. Archimedes idea of comparison, leading to ideas like the Cavalieri principle, integral 
geometry or measure theory. René Descartes idea of coordinates which allowed to work on 
geometry using algebraic tools, the use of infinitesimals and limits leading to calculus, allow- 
ing to merge concepts of rate of change and accumulation, the idea of extrema leading to the 
calculus of variations or Lagrangian and Hamiltonian dynamics or descriptions of fundamental 
forces. Maximizing quantities like entropy lead to fundamental distributions like the Gaussian, 
exponential, Binomial or uniform distributions. Cantor’s set theory allows for a universal 
simple language to cover all of mathematics, the Klein Erlangen program of “classifying and 
characterizing geometries through symmetry". The abstract idea of a group or more general 
mathematical structures like monoids. The concept of extending number systems like com- 
pleting the real numbers or extending it to the quaternions and octonions or then producing 
p-adic number or hyperreal numbers. The concept of complex numbers or more gener- 
ally the idea of completion of a field. The idea of logarithms [623]. The idea of Galois to 
relate problems about solving equations with field extensions and symmetries. The idea of 
equivalence classes is used when looking at projective spaces or ideals. The idea of seeing 
prime ideals as a more fundamental replacement for “maximal ideal" or “point", leading to the 
notion of spectrum of a ring and by gluing to the notion of schemes vastly expanding classical 
algebraic geometry. The Grothendieck program of “geometry without points" or “locales" 
as topologies without points in order to overcome shortcomings of set theory. This lead to new 
objects like schemes or topoi. Central in algebra, geometry and number theory is the idea 
of localilization which allows to extend a ring so that one can start “dividing", the prototype 
being the field of fractions like the construction of rational functions from polynomials. An 
other basic big idea is the concept of duality, which appears in many places like in projective 
geometry, in polyhedra, Poincaré duality or Pontryagin duality or Langlands duality for 
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reductive algebraic groups. The idea of dimension to measure topological spaces numerically 
leading to fractal geometry. The idea of almost periodicity is an important generaliza- 
tion of periodicity. Crossing the boundary of integrability leads to the important paradigm 
of stability and randomness [50I] and the interplay of structure and randomness [651]. These 
themes are related to harmonic analysis and integrability as integrability means that for 
every invariant measure one has almost periodicity. It is also related to spectral properties in 
solid state physics or via Koopman theory in ergodic theory or then to fundamental new 
number systems like the p-adic numbers: the p-adic integers form a compact topological 
group on which the translation is almost periodic. It also leads to problems in Diophantine 
approximation. The concept of algorithm and building the foundation of computation using 
precise mathematical notions. The use of algebra to track problems in topology starting with 
mathematicians like Kirchhoff, Betti, Poincaré or Emmy Nother. An other important principle 
is to reduce a problem to a fixed point problem. This often leads to universality like for 
the central limit theorem (where the Gaussian distribution is the fixed point). The categor- 
ical approach is not only a unifying language but also allows for generalizations of concepts 
allowing to solve problems. Examples are generalizations of Lie groups in the form of group 
schemes. Then there is the deformation idea which was used for example in the Perelman 
proof of the Poincaré conjecture. Deformation often comes in the form of partial differ- 
ential equations and in particular heat type equations. Deformations can be abstract in the 
form of homotopies or more concrete by analyzing concrete partial differential equations like 
the mean curvature flow or Ricci flow. An other important line of ideas is to use probabil- 
ity theory to prove results, even in combinatorics. A probabilistic argument can often give 
existence of objects which one can not even construct. Examples are to define a sequence of 
simplicial complexes G,, with n nodes for which the Euler characteristic y(G,) = >,(—1)¢™™ 
is exponentially large in n. The idea of non-commutative geometry generalizing geometry 
through functional analysis or the idea of discretization which leads to numerical methods or 
computational geometry. The power of coordinates allows to solve geometric problems more 
easily. The above mentioned examples have all proven their use. Grothendieck’s ideas have lead 
to the solution of the Weyl conjectures, fixed point theorems were used in Game theory 
(first by Nash), or be used to prove uniqueness of solutions of differential equations. It is also 
used to justify perturbation theory using renormalization schemes or iterative methods like in 
the KAM theorem about the persistence of quasi-periodic motion leading to hard implicit 
function theorems. In the end, what really counts is whether the big idea can solve practical 
problems or that it can be used to new theorems (or reprove old theorems more elegantly). The 
history of mathematics clearly shows that abstraction for the sake of abstraction or for the sake 
of generalization rarely was able to convince the mathematical community initially. But it can 
also happen that the break-through of a new theory or generalization only pays off much later 
and that a subtle generalization actually pushes the tool into a realm where it can be used in 
other contexts. A big idea might have to age like a good wine. 


PARADIGMS 


There is once in a while an idea which completely changes the way we look at things. These 
are paradigm shifts as described by the philosopher and historian Thomas Kuhn who relates 
it also to scientific revolutions [427]. For mathematics, there are various places, where 
such fundamental changes happened: the introduction of written numbers which happened 
independently in various different places. An early example is the tally mark notation on 
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tally sticks (early sources are the Lebombo bone from 40 thousand years ago or the Ishango 
bone from 20 thousand years ago) or the technology of talking knots, the khipu [666], which 
is a topological writing which flourished in the Tawantinsuyu, the Inka empire. An other 
example of a paradigm change is the development of proof, which required the insight that 
some mathematical statements are assumed as axioms from which, using logical deduction, 
new theorems are proven. Also proof assistant frameworks like SAM [835], ACL2 [455], 
Coq [669], Isabelle [646], Lean (extended to Xena in an educational setting) have emerged 
allowing to build in more reliability and accountability to proofs. The fact that axiom 
systems can be deformed like from Euclidean to non-Euclidean geometry was definitely a 
paradigm change. On a larger scale, the insight that even the axiom systems of mathematics 
can be deformed and extended in various ways came only in the 20th century with Gédel. 
Before that, one was under the impression that one could base all of mathematics on a universal 
axiom system. This was Hilbert’s program [709]. A third example of a paradigm change is the 
introduction of the concept of functions which came surprisingly late. The modern concept 
of a function which takes a quantity and assigns it a new quantity came only late in the 19’th 
century with the development of set theory, which is a paradigm change too. There had been 
a long struggle also with understanding limits, which puzzled already Greek mathematicians 
like Zeno but which really only became solid with clear definitions like Weierstrass and then 
with the concept of topology where the concept of limit is absorbed within set theory, for 
example using the notion of filters. Related to functions is the use of functions to understand 
combinatorial or number theoretical problems, like through the use of generating functions, 
or Dirichlet series, allowing analytic tools to solve discrete problems like the existence of 
primes on arithmetic progressions. The opposite, the use of discrete structures like finite groups 
to understand the continuum like Galois theory is an other example of a paradigm change. It 
led to the insight that the quadrature of the circle, or angle trisection can not be done with 
ruler and compass. There are various other places, where paradigm changes happened. A 
nice example is the axiomatization of probability theory by Kolmogorov or the realization that 
statistics becomes a geometric theory if random variables are seen as vectors in a vector 
space: the correlation between two random variables is the cosine of the angle between centered 
versions of these random variables. Paradigm changes which are really fundamental can be 
surprisingly simple. An example is the Connes formula [187] which is based on the simple 
idea that distance can be measured by extremizing slope. This allows to push traditional 
geometry into non-commutative settings or discrete settings, where a priory no metric (notion 
of distance) is given. An other example is the extremely simple but powerful idea of the 
Grothendieck extension of a monoid to a group. It has been used throughout the history 
of mathematics to generate new number systems starting with getting integers from natural 
numbers, rational numbers from integers, complex numbers from real numbers or quaternions 
from complex numbers, or the construction of surreal numbers or games generalizing numbers. 
The idea is also used in dynamical systems theory to generate from a not necessarily invertible 
dynamical system an invertible dynamical system by extending time from a monoid to a group. 
In the context of Grothendieck, one should mention also that category theory similarly as set 
theory at the beginning of the last century changed the way mathematics is done and extended. 
Like the switch from relational data bases to graph databases, it is a paradigm change 
stressing more the relations (arrows) between objects (nodes) and not only the objects (sets) 
themselves. 
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TAXONOMIES 


When looking at mathematics overall, taxonomies are important. They not only help to 
navigate the landscape, they are also interesting from a pedagogical as well as historical point 
of view. I borrow here some material from my course Math E 320 which is so global that 
a taxonomy is helpful. Organizing a field using markers is also important when teaching 
intelligent machines, a field which be seen as the pedagogy for AI. The big bulk of work in 
was to teach a bot mathematics, which means to fill in thousands of entries of knowledge. 
It can appear a bit mind numbing as it is a similar task than writing a dictionary. But 
writing things down for a machine actually is even tougher than writing things down for a 
student. We can not assume the machine to know anything it is not told. This document 
about fundamental theorems by the way could relatively easily be adapted into a database of 
“important theorems". It actually is one my aims to feed it eventually to the Sofia bot. If 
the machine is asked about “important theorem in mathematics", it should be well informed, 
even so it is just a “stupid" encyclopedic data entry. Historically, when knowledge was still 
sparse, one has classified teaching material using the liberal arts of sciences, the trivium: 
grammar, logic and rhetoric, as well as the quadrivium: arithmetic, geometry, music, and 
astronomy. More specifically, one has built the eight ancient roots of mathematics which 
are tied to activities: counting and sorting (arithmetic), spacing and distancing (geometry), 
positioning and locating (topology), surveying and angulating (trigonometry), balancing and 
weighing (statics), moving and hitting (dynamics), guessing and judging (probability) and 
collecting and ordering (algorithms). This leads then to topics like Arithmetic, Geometry, 
Number Theory, Algebra, Calculus, Set theory, Probability, Topology, Analysis, Numerics, 
Dynamics and Algorithms. The AMS classification is much more refined and distinguishes 
64 fields. The Bourbaki point of view is given in : it partitions mathematics into algebraic 
and differential topology, differential geometry, ordinary differential equations, ergodic theory, 
partial differential equations, non-commutative harmonic analysis, automorphic forms, analytic 
geometry, algebraic geometry, number theory, homological algebra, Lie groups, abstract groups, 
commutative harmonic analysis, logic, probability theory, categories and sheaves, commutative 
algebra and spectral theory. What are hot spots in mathematics? Michael Atiyah 
distinguished parameters like local - global, low and high dimensional, commutative - 
non-commutative, linear - nonlinear, geometry - algebra, physics and mathematics. 


KEY EXAMPLES 


The concept of experiment came even earlier and has always been part of mathematics. Ex- 
periments allow to get good examples and set the stage for a theorem. A Obviously the theorem 
can not contradict any of the examples. But examples are more than just a tool to falsify state- 
ments; a good example can be the seed for a new theory or for an entire subject. Here are 
a few examples: in smooth dynamical systems the Smale horse shoe comes to mind, in 
differential topology the exotic spheres of Milnor, in one-dimensional dynamics the lo- 
gistic map, or Hénon map, in perturbation theory of Hamiltonian systems the Standard 
map featuring KAM tori or Mather sets, in homotopy theory the dunce hat or Bing house, 
in combinatorial topology the Rudin sphere, the Nash-Kuiper non-smooth embedding 
of a torus into Euclidean space, in topology there is the Alexander horned sphere or the 


?To quote Vladimir Arnold: “Mathematics is a part of physics where experiments are cheap" 
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Antoine necklace. In complexity theory there is the busy beaver problem in Turing com- 
putation which is an illustration with how small machines one can achieve great things, in 
group theory there is the Rubik cube which illustrates many fundamental notions for finitely 
presented groups, in fractal analysis the Cantor set, the Menger sponge, in Fourier the- 
ory the series of f(x) = x mod 1, in Diophantine approximation the golden ratio, in the 
calculus of sums the zeta function, in dimension theory the Banach Tarski paradox. In 
harmonic analysis the Weierstrass function as an example of a nowhere differentiable func- 
tion. The case of Peano curves giving concrete examples of a continuous bijection from an 
interval to a square or cube. In complex dynamics not only the Mandelbrot set plays an 
important role, but also individual, specific Julia sets can be interesting. Examples like the 
Mandelbulb have not yet been investigated mathematically. In mathematical physics, the 
almost Matthieu operator [156] produced a rich theory related to spectral theory, Diophan- 
tine approximation, fractal geometry and functional analysis. Besides examples illustrating a 
typical case, it is also important to explore the boundary and limitations of a theorem or theory 
by looking at counter examples. Collections of counter examples exist in many fields like 


[244] (624) [554] [634] (699} [107 [390]. 


PHYSICS 


One can also make a list of great ideas in physics [192] and see the relations with the fundamental 
theorems in mathematics. A high applicability should then contribute to a value functional 
in the list of theorems. Great ideas in physics are the concept of space and time, meaning 
to describe physical events using differential equations. In cosmology, one of the insights was 
to understand the structure of our solar system and getting for a earth centered to a heliocentric 
system, an other is to look at space-time as a hole and realize the expansion of the universe 
or that the idea of a big bang. More general is the Platonic idea that physics is geometry. 
Or calculus: Lagrange developed his calculus of variations to find laws of physics. Then 
there is the idea of Lorentz invariance and symmetries more general which leads to special 
relativity, there is the idea of general relativity which allows to describe gravity through 
geometry and a larger symmetry seen through the equivalence principle. There is the idea of 
see elementary particles using Lie groups. There is the Noether theorem which is the idea 
that any symmetry is tied to a conservation law: translation symmetry leads to momentum 
conservation, rotation symmetry to angular momentum conservation for example. Symmetries 
also play a role when spontaneous broken symmetry or phase transitions. There is the 
idea of quantum mechanics which mathematically means replacing differential equations with 
partial differential equations or replacing commutative algebras of observables with non- 
commutative algebras. An important idea is the concept of perturbation theory and in 
particular the notion of linearization. Many laws are simplifications of more complicated laws 
and described in the simplest cases through linear laws like Ohms law or Hooks law. Quantiza- 
tion processes allow to go from commutative to non-commutative structures. Perturbation 
theory allows then to extrapolate from a simple law to a more complicated law. Some is easy 
application of the implicit function theorem, some is harder like KAM theory. There is the 
idea of using discrete mathematics to describe complicated processes. An example is the 
language of Feynman graphs or the language of graph theory in general to describe physics as 
in loop quantum gravity or then the language of cellular automata which can be seen as par- 
tial difference equations where also the function space is quantized. The idea of quantization, 
a formal transition from an ordinary differential equation like a Hamiltonian system to a partial 
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differential equation or to replace single particle systems with infinite particle systems (Fock). 
There are other quantization approaches through deformation of algebras which is related 
to non-commutative geometry. There is the idea of using smooth functions to describe 
discrete particle processes. An example is the Vlasov dynamical system or Boltzmann’s 
equation to describe a plasma, or thermodynamic notions to describe large sets of particles 
like a gas or fluid. Dual to this is the use of discretization to describe a smooth system by 
discrete processes. An example is numerical approximation, like using the Runge-Kutta 
scheme to compute the trajectory of a differential equation. There is the realization that we 
have a whole spectrum of dynamical systems, integrability and chaos and that some of the 
transitions are universal. An other example is the tight binding approximation in which 
a continuum Schrédinger equation is replaced with a bounded discrete Jacobi operator. 
There is the general idea of finding the building blocks or elementary particles. Starting 
with Demokrit in ancient Greece, the idea got refined again and again. Once, atoms were 
detected and charges found to be quantized (Robert Millikan), the structure of the atom was 
explored (Rutherford), and then the atom got split (Lisa Meitner, Otto Hahn). The structure 
of the nuclei with protons and neutrons was then refined again using quarks leading the stan- 
dard model in particle physics. There is furthermore the idea to use statistical methods 
for complex systems. An example is the use of stochastic differential equations like diffusion 
processes to describe actually deterministic particle systems. There is the insight that compli- 
cated systems can form patterns through interplay between symmetry, conservation laws and 
synchronization. Large scale patterns can be formed from systems with local laws. Finally, 
there is the idea of solving inverse problems using mathematical tools like Fourier theory or 
basic geometry (Eratostenes could compute the radius of the earth by comparing the lengths 
of shadows at different places of the earth.) An example is tomography, where the structure 
of some object is explored using resonance and where the reconstruction solves an inverse 
problem. Then there is the idea of scale invariance which allows to describe objects which 
have fractal nature. 


COMPUTER SCIENCE 


As in physics, it is harder to pinpoint “big ideas" in computer science as they are in general not 
theorems. But it has been done [405]. The initial steps of mathematics was to build a language, 
where numbers represent quantities [[45]. Physical tools which assist in manipulating numbers 
can already been seen as a computing device. Marks on a bone, pebbles in a clay bag, talking 
knots in a Khipu [26], marks on a Clay tablet were the first step. Papyri, paper, magnetic, 
optical and electric storage, the tools to build memory were refined over millenniums. The 
mathematical language allowed us to explore topics beyond the finite and also build data 
bases. The Khipu concept was already an early form of graph database [8]. Using a finite 
number of symbols we can represent and count infinite sets, have notions of cardinality, have 
various number systems and more generally have algebraic structures. Numbers can 
even be seen as games [144] 409]. A major idea is the concept of an algorithm. Adding or 
multiplying on an abacus already was an algorithm. The concept was refined in geometry, 
where ruler and compass were used as computing devices, like the construction of points 
in a triangle. To measure the effectiveness of an algorithm, one can use notions of complexity. 
This has been made precise by computing pioneers like Alan Turing, as one has to formulate 
first what a “computation" is. The concept of the Turing machine is particularly elegant as 
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it is both a theoretical construct as well as a concrete machine (although extremely inefficient). 
In the last century one has seen that computations and proofs are very similar and that they 
have similar general restrictions. There are some tasks which can not be computed with a 
Turing machine and there are theorems which can not be proven in a specific axiom system. 
As mathematics is a language, we have to deal with concepts of syntax, grammar, notation, 
context, parsing, validation, verification. As Mathematics is a human activity which is 
done in our brains, it is related to psychology and computer architecture. Computer science 
aspects are also important also in pedagogy and education how can an idea be communicated 
clearly? How do we motivate? How do we convince peers that a result is true? Examples 
from history show that this is often done by authority and that the validity of some proofs 
turned out to be wrong or incomplete, even in the case of fundamental theorems or when 
treated by great mathematicians. (Examples are the fundamental theorem of arithmetic, the 
fundamental theorem of algebra or the wrong published proof of Kempe of the 4 color theorem). 
On the other hand, there were also quite many results which only later got recognized. The 
work of Galois for example only exploded much later. How come we trust a human brain 
more than an electronic one? We have to make some fundamental assumptions for example 
to be made like that if we do a logical step "if A and B then “A and B" holds. This assumes 
for example that our memory is faithful: after having put A and B in the memory and 
making the conclusion, we have to assume that we did not forget A nor B! Why do we trust 
this more than the memory of a machine? As we are also assisted more and more by electronic 
devices, the question of the validity of computer assisted proofs comes up. The 4-color 
theorem of Kenneth Appel and Wolfgang Haken based on previous work of many others like 
Heinrich Heesch or the proof of the Feigenbaum conjecture of Mitchell Feigenbaum first 
proven by Oscar Lanford III or the proof of the Kepler problem given by Thomas Hales are 
examples. A great general idea is related to the representation of data. This can be done using 
matrices like in a relational database or using other structures like graphs leading to graph 
databases. The ability to use computers allows mathematicians to do experiments. A branch 
of mathematics called experimental mathematics [24] relies heavily on experiments to 
find new theorems or relations. Experiments are related to simulations. We are able, within 
a computer to build and explore new worlds, like in computer games, we can enhance the 
physical world using virtual reality or augmented reality or then capturing a world by 
3D scanning and realize a world by printing the objects [404]. A major theme is artificial 
intelligence [348]. It is related to optimization problems like optimal transport, neural 
nets as well as inverse problems like structure from motion problems. An intelligent 
entity must be able to take information, build a model and then find an optimal strategy to 
solve a given task. A self-driving car for example has to be able to translate pictures from a 
camera and build a map, then determine where to drive. Such tasks are usually considered 
part of applied mathematics but they are very much related with pure mathematics because 
computers also start to learn how to read mathematics, how to verify proofs and to find new 
theorems. Artificial intelligent agents [685] were first developed in the 1960ies learned also 
some mathematics. I myself learned about it when incorporated computer algebra systems into 
a chatbots in [402]. AI has now become a big business as Alexa, Siri, Google Home, IBM 
Watson or Cortana demonstrate. But these information systems must be taught, they must 
be able to rank alternative answers, even inject some humor or opinions. Soon, they will be 
able to learn themselves and answer questions like “what are the 10 most important theorems 
in mathematics?" 
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BREVITY 


We live in a instagram, snapchat, twitter, microblog, vine, tiktok, watch-mojo, petcha-kutcha 
time. Many of us multi task, read news on smart phones, watch faster paced movies, read 
shorter novels and feel that a million word Marcel Proust’s masterpiece “a la recherche du 
temps perdu" is “temps perdu". Even classrooms and seminars have become more aphoristic. 
Micro blogging tools are only the latest incarnation of “miniature stories". They continue 
the tradition of older formats like "mural art" by Romans to modern graffiti or “aphorisms" 
[421|), poetry, cartoons, Unix fortune cookies [21]. Shortness has appeal: aphorisms, 
poems, ferry tales, quotes, words of wisdom, life hacker lists, and tabloid top 10 lists illustrate 
this. And then there are books like “Math in 5 minutes", “30 second math", “math in minutes" 
[46] [195], which are great coffee table additions. Also short proofs are appealing like “Let 
epsilon be smaller than zero" which is the shortest known math joke, or “There are three type 
of mathematicians, the ones who can count, and the ones who can’t." Also short open problems 
are attractive, like the twin prime problem “there are infinitely many twin primes" or the 
Landau problem “there are infinitely many primes of the form n? + 1, or the Goldbach 
problem “every n > 2 is the sum of two primes". For the larger public in mathematics 
shortness has appeal: according to a poll of the Mathematical Intelligencer from 1988, the 
most favorite theorems are short [687]. Results with longer proofs can make it to graduate 
courses or specialized textbooks but still then, the results are often short enough so that they 
can be tweeted without proof. Why is shortness attractive? Paul Erd6s expressed short elegant 
proofs as “proofs from the book" [12]. Shortness reduces the possibility of error as complexity is 
always a stumbling block for understanding. But is beauty equivalent to brevity? Buckminster 
Fuller once said: “If the solution is not beautiful, I know it is wrong." [9]. Much about the 
aesthetics in mathematics is investigated in [497]. According to [570], the beauty of a piece 
of mathematics is frequently associated with the shortness of statement or of proof: beautiful 
theories are also thought of as short, self-contained chapters fitting within broader theories. 
There are examples of complex and extensive theories which every mathematician agrees to 
be beautiful, but these examples are not the one which come to mind. Also psychologists and 
educators know that simplicity appeals to children: From For now, I want simply to draw 
attention to the fact that even for a young, mathematically naive child, aesthetic sensibilities 
and values (a penchant for simplicity, for finding the building blocks of more complex ideas, 
and a preference for shortcuts and ‘liberating" tricks rather than cumbersome recipes) animates 
mathematical experience. It is hard to exhaust them all, even not with tweets: there are more 
than googool”? = 107° texts of length 140. This can not all ever be written down because there 
are more than what we estimate the number of elementary particles. But there are even short 
story collections. Berry’s paradox tells in this context that the shortest non-tweetable text in 
140 characters can be tweeted: "The shortest non-tweetable text". Since we insist on giving 
proofs, we have to cut corners. Books containing lots of elegant examples are [17 [12]. We 
should add that brevity is not a new thing. J.E. Littlewood has raised the question how short 
a dissertation can be and proves in an example, that two sentences are enough and gives a 
one-sentence proof of the fact that bounded entire functions are constant by using Cauchy’s 
integral theorem. It has been refined a bit in [713]. 


TWITTER MATH 


The following 42 tweets were written in 2014, when twitter still had a 140 character limit. Some 
of them were actually tweeted. The experiment was to see which theorems are short enough 
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so that one can tweet both the theorem as well as the proof in 140 characters. Of course, that 
often required a bit of cheating. See for proofs from the books, where the proofs have full 
details. 


Euclid: The set of primes is infinite. Proof: let p be largest 
prime, then p! +1 has a larger prime factor than p. Contradic- 
tion. 


Euclid: 2?—1 prime then 2?~'(2?—1) is perfect. Proof. a(n) 
sum of factors of n, o(2” — 1)2"7') = o(2" — 1)a(2"") = 
27(2” — 1) = 2-2"(2" — 1) shows o(k) = 2k. 


Hippasus: V2 is irrational. Proof. If /2 = p/q, then 2q? = 
p?. To the left is an odd number of factors 2, to the right it is 
even one. Contradiction. 


Pythagorean triples: all x? + y? = z? are of form (z,y, z) = 
(2st, s* — t?,s? + t?). Proof: x or y is even (both odd gives 
x+y? = w* with odd k). Say 2? is even: write 2? = z7-—y? = 
(z—y)(z+y). This is 4s?t?. Therefore 2s? = z—y, 2t? = z+y. 
Solve for z, y. 


Pigeon principle: if n + 1 pigeons live in n boxes, there is a 
box with 2 or more pigeons. Proof: place a pigeon in each box 
until every box is filled. The pigeon left must have a roommate. 


Angle sum in triangle: a+ 6+ y = KA+7 if K is cur- 
vature, A triangle area. Proof: Gauss-Bonnet for surface with 
boundary. a, 8,7 are Dirac measures on the boundary. 


Chinese remainder theorem: a(i) x = b(i) mod n(i) has a 
solution if gcd(a(i),n(i))=0 and ged(n(i),n(j))=0 Proof: solve 
eq(1), then increment x by n(1) to solve eq(2), then increment 
x by n(1) n(2) until second is ok. etc. 


Nullstellensatz: algebraic sets in K” are 1:1 to radical ideals 
in K [2 ...¢,]. Proof: An algebra over K which is a field is finite 
field extension of K. 


Fundamental theorem algebra: a polynomial of degree n 
has exactly n roots. Proof: the metric g = |f|~?/"|dz|? on 
the Riemann sphere has curvature K = n~'A log |f|. Without 
root, K=0 everywhere contradicting Gauss-Bonnet. [15]: 
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Fermat: p prime (a,p) = 1, then pia? — a Proof: induction 
with respect to a. Case a = 1 is trivial (a + 1)? — (a+ 1) 
is congruent to a? — a modulo p because Binomial coefficients 
B(p, k) are divisible by p fork =1,...p—1. 


Wilson: p is prime iff p|(p — 1)! +1 Proof. Group 2,...p— 2 
into pairs (a,a~ 1) whose product is 1 modulo p. Now (p—1)! = 
(p—1) = —1 modulo p. If p = ab is not prime, then (p—1)! = 0 
modulo p and p does not divide (p — 1)! 4 1. 


Bayes: A, B are events and A° is the complement. P[A|B] = 
P|B|A|P[A]/(P[B|A]P[A] + P[B|A‘|P[A‘] Proof: By defini- 
tion P[A|B]P[B] = P[AN B]. Also P[B] = (P[B|A]P[A] + 
P|B|A‘|P[A‘. 


Archimedes: Volume of sphere S(r) is 4r?/3 Proof: the 
complement of the cone inside the cylinder has at height z the 
cross section area r? — 2”, the same as the cross section area of 
the sphere at height z. 


Archimedes: the area of the sphere S(r) is 4ar? Proof: dif- 
ferentiate the volume formula with respect to r or project the 
sphere onto a cylinder of height 2 and circumference 27 and 
not that this is area preserving. 


Cauchy-Schwarz: |v-w| < |v||w|. Proof: scale to get |w| = 1, 
define a = v.w, so that 0 < (v—aw)...(v—aw) = |vo|?-a = 


[v|?|eu|? — (v -w)?. 


Angle formula: Cauchy-Schwarz defines the angle between 
two vectors as cos(A) = v.w/|v||w|. If v, w are centered random 
variables, then v - w is the covariance, |v|,|w| are standard 
deviations and cos(A) is the correlation. 


Cos formula: c? = a? + b* — abcos(A) in a triangle ABC 
(Al-Kashi theorem) Proof: v = AB,w = AC has length a = 
|v|, 6 = |w|, |c| = |v —w|. Now: (v—w).(v—w) = |u|? + Jw? - 
2|v||w| cos(A). 


Pythagoras: A = 7/2, then c? = a?+b?. Proof: Let v = AB, 
w = AC, v—w = BC be the sides of the triangle. Multiply 
out (v—w)-(v—w) = |v|? + |w/? and use v- w = 0. 
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Euler formula: exp(ix) = cos(x) +isin(x). Proof: exp(ix) = 
1+ (ix) + (ix)?/2! — ... Pair real and imaginary parts and use 
definition cos(x) = 1—2?/2!+2*/4!... and sin(x) = x—2?/3!4+ 
i ee 


Discrete Gauss-Bonnet >, K(x) = x(G) with K(x) =1- 
Vo(x)/2+ Vi(x)/3 + Vo(x)/4... curvature y(G) = vo — v1 + v2 — 
v3... Euler characteristic Proof: Use handshake $°,Vi(z) = 
Upsi/(k ale 2): 


Poincaré-Hopf: let f be a coloring, is(x) = 1 — x(S; (2)), 
where S; (x) = y € S(x)|f(y) < f(x) Dlis(z) = x(G). Proof 
by induction. Removing local maximum of f reduces Euler 
characteristic by x(B;(z)) —x(S- f(x)) = 7;(z). 


Lefschetz: 5°) ir(z) = str(Z|H(G)). Proof: LHS is 
str(exp(—OL)Ur) and RHS is str(exp(—tL)Ur) for t > oo. 
The super trace does not depend on t. 


Stokes: orient edges F of graph G. F': E > R function, S$ 
surface in G with boundary C. d(F)(ijk) = F(ij) + F(jk) — 
F'(ki) is the curl. The sum of the curls over all triangles is the 
line integral of F' along C’. 


Plato: there are exactly 5 platonic solids. Proof: number 
f of n-gon satisfies f = 2e/n, v vertices of degree m satisfy 
v = 2e/mv—e+ f—2 means 2e/m—e+2e/n = 2 or 1/m4+1/n = 
e+ 1/2 with solutions: (m—4,n=3)) m= 3,7 —=5), a= 
= 2) (=O. — On — B= 4 


Poincaré recurrence: J’ area-preserving map of probabil- 
ity space (X,m). If m(A) > 0 and n > 1/m(A) we have 
m(T*(A) MA) > 0 for some 1 < k < n Proof. Otherwise 
A,T(A),...,7"(A) are all disjoint and the union has measure 
n-m(A) > 1. 


Turing: there is no Turing machine which halts if input is 
Turing machine which halts: Proof: otherwise build an other 
one which halts if the input is a non-halting one and does not 
halt if input is a halting one. 
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Cantor: the set of reals in [0,1] is uncountable. Proof: if there 
is an enumeration x(k), let x(k,l) be the I’th digit of x(k) 
in binary form. The number with binary expansion y(k) = 
x(k,k) +1 mod 2 is not in the list. 


Niven: 7 ¢ Q: Proof: 7 = a/b, f(x) = 2"(a—bx)"/n! satisfies 
f(pi-x) = f(x) and 0 < f(z) < ra"/n” f‘j)(x) = 0 at 0 and 
n for 0 < j <n shows F(x) = f(z) — f(x) + fO(x)---4+ 
(—1)"f@") (x) has F(0), F(x) € Z and F+ F" = f. Now 
(F"(x) sin(x) — F(x) cos(x)) = fsin(x), so f- f(x) sin(x)da € 
Z. 


Fundamental theorem calculus: With differentiation 
Df (x) = f(a+1)— f(x) and integration Sf(x) = f(0)+ f(1)+ 
+ f(n —1) have SDf(z) = f(x) — f(0), DSf(x) = f(z). 


Taylor: f(z +t) = >>, f(x)t*/k!. Proof: f(x + t) satisfies 
transport equation f; = f, = Df an ODE for the differential 
operator D. Solve f(x +t) = exp(Dt) f(z). 


Cauchy-Binet: det(1 + F7G) = )Y°pdet(Fp) det(Gp) 
Prot A = FP. Coefficients of det(a — A) is 


Intermediate: f continuous f(0) < 0, f(1) > 0, then there 
exists 0 < « <1, f(x) =0. Proof. If f(1/2) < 0 do proof with 
(1/2,1) If f(1/2) > 0 redo proof with (0, 1/2). 


Ergodicity: T(z) = x +a mod 1 with irrational a 
is ergodic. Proof. f = > a(n)exp(ine) Tf = 
>=, a(n) exp(ina) exp(inz) = f implies a(n) = 0. 


Benford: first digit k of 2" appears with probability log(1 — 
1/k) Proofs T : x — «a + log(2) mod 1 is ergodic. 
log(2”)mod 1 = k if log(k) < T"(0) < log(k +1). The proba- 
bility of hitting this interval is log(k + 1)/log(k). 


Rank-Nullity: dim(ker(A)) + dim(im(A)) = n for m x n 
matrix A. Proof: a column has a leading 1 in rref(A) or no 
leading 1. In the first case it contributes to the image, in the 
second to a free variable parametrizing the kernel. 


148 


OLIVER KNILL 


Column-Row picture: A: R™ > R”. The k’th column of A 
is the image Ae,. If all rows of A are perpendicular to x then 
x is in the kernel of A. 


Picard: 2’ = f(x),x(0) = xo has locally a unique solution if 
f € C’. Proof: the map T(y) = fo Fy(s)) ds is a contrac- 
tion on C((0,a]) for small enough a > 0. Banach fixed point 
theorem. 


Banach: a contraction d(T (x),T(y)) < ad(x,y) on complete 
(X,d) has a unique fixed point. Proof: d(z,,2,) < a*/(1 — a) 
using triangle inequality and geometric series. Have Cauchy 
sequence. 


Liouville: every prime p=4k+1 is the sum of two squares. 
Proof: there is an involution on S' = (2, y, z)|x? + 4yz = p with 
exactly one fixed point showing |S| is odd implying (2, y, z)— > 
(x, z,y) has a fixed point. 


Banach-Tarski: The unit ball in R® can be cut into 5 pieces, 
re-assembled using rotation and translation to get two spheres. 
Proof: cut cleverly using axiom of choice. 


MATH AREAS 


We add here the core handouts of Math E320 which aimed to give for each of the 12 math- 
ematical subjects an overview on two pages. For that course, I had recommended books like 
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E-320: Teaching Math with a Historical Perspective O. Knill, 2010-2018 


Lecture 1: Mathematical roots 


Similarly, as one has distinguished the canons of rhetorics: memory, invention, delivery, style, and arrange- 
ment, or combined the trivium: grammar, logic and rhetorics, with the quadrivium: arithmetic, geometry, 
music, and astronomy, to obtain the seven liberal arts and sciences, one has tried to organize all mathe- 
matical activities. 


counting and sorting arithmetic 


. . a spacing and distancing geometry 
Historically, one has distin- ie . 
. ; . positioning and locating | topology 
guished eight ancient roots of : : ‘ 

. surveying and angulating | trigonometry 
mathematics. Each of these 8 F ene 4 
Cilen a balancing and weighing | statics 
activities in turn suggest a key 


; : moving and hitting dynamics 
area in mathematics: . . ; oe 
guessing and judging probability 
collecting and ordering algorithms 


To morph these 8 roots to the 12 mathematical areas covered in this class, we complemented the ancient roots 
with calculus, numerics and computer science, merge trigonometry with geometry, separate arithmetic into 


number theory, algebra and arithmetic and turn statics into analysis. 


counting and sorting arithmetic 
spacing and distancing geometry 
positioning and locating topology 
dividing and comparing number theory 
Let us call this modern adapta- . ie ‘ 
balancing and weighing analysis 
tion the F re : 
moving and hitting dynamics 
12 <nedern. aes. oe guessing and judging probability 
Mathias: collecting and ordering algorithms 
slicing and stacking calculus 


operating and memorizing | computer science 
optimizing and planning numerics 
manipulating and solving | algebra 


Arithmetic numbers and number systems 
Geometry invariance, symmetries, measurement, maps 
While relating mathe- Number theory | Diophantine equations, factorizations 
matical areas with hu- Algebra algebraic and discrete structures 
man activities is useful, Calculus limits, derivatives, integrals 
it makes sense to select Set Theory set theory, foundations and formalisms 
specific topics in each of Probability combinatorics, measure theory and statistics 
this area. These 12 top- Topology polyhedra, topological spaces, manifolds 
ics will be the 12 lectures Analysis extrema, estimates, variation, measure 
of this course. Numerics numerical schemes, codes, cryptology 
Dynamics differential equations, maps 
Algorithms computer science, artificial intelligence 


Like any classification, this chosen division is rather arbitrary and a matter of personal preferences. The 2010 
AMS classification distinguishes 64 areas of mathematics. Many of the just defined main areas are broken 
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off into even finer pieces. Additionally, there are fields which relate with other areas of science, like economics, 


biology or physics:a 


00 General 
01 History and biography 


03 Mathematical logic and foundations 


05 Combinatorics 


06 Lattices, ordered algebraic structures 


08 General algebraic systems 
11 Number theory 
12 Field theory and polynomials 


13 Commutative rings and algebras 


14 Algebraic geometry 


15 Linear/multi-linear algebra; matrix theory 


16 Associative rings and algebras 


17 Non-associative rings and algebras 
18 Category theory, homological algebra 


19 K-theory 


20 Group theory and generalizations 


22 Topological groups, Lie groups 
26 Real functions 
28 Measure and integration 


30 Functions of a complex variable 


31 Potential theory 


32 Several complex variables, analytic spaces 


33 Special functions 
34 Ordinary differential equations 
35 Partial differential equations 


37 Dynamical systems and ergodic theory 
39 Difference and functional equations 


40 Sequences, series, summability 


41 Approximations and expansions 


42 Fourier analysis 
43 Abstract harmonic analysis 


44 Integral transforms, operational calculus 


What are 


fancy developments 


in mathematics today? Michael Atiyah 


[29] identified in the year 2 
following six hot spots: 


000 the 


45 Integral equations 

46 Functional analysis 

47 Operator theory 

49 Calculus of variations, optimization 
51 Geometry 

52 Convex and discrete geometry 

53 Differential geometry 

54 General topology 

55 Algebraic topology 

57 Manifolds and cell complexes 

58 Global analysis, analysis on manifolds 
60 Probability theory and stochastic processes 
62 Statistics 

65 Numerical analysis 

68 Computer science 

70 Mechanics of particles and systems 


74 Mechanics of deformable solids 

76 Fluid mechanics 

78 Optics, electromagnetic theory 

80 Classical thermodynamics, heat transfer 
81 Quantum theory 

82 Statistical mechanics, structure of matter 
83 Relativity and gravitational theory 

85 Astronomy and astrophysics 

86 Geophysics 

90 Operations research, math. programming 
91 Game theory, Economics Social and Behavioral Sciences 
92 Biology and other natural sciences 

93 Systems theory and control 

94 Information and communication, circuits 
97 Mathematics education 


local and global 
low and high dimension 
commutative and non-commutative 
linear and nonlinear 
geometry and algebra 
physics and mathematics 


Also this choice is of course highly personal. One can easily add 12 other polarizing quantities which help to 


distinguish or parametrize different parts of mathematical areas, especially the ambivalent pairs which produce 


a captivating gradient: 


regularity and 
integrable and 
invariants and 
experimental and 
polynomial and 
applied and 


randomness 
non-integrable 
perturbations 
deductive 
exponential 
abstract 


discrete and continuous 
existence and construction 
finite dim and __ infinite dimensional 
topological and _ differential geometric 
practical and theoretical 
axiomatic and case based 


The goal is to illustrate some of these structures from a historical point of view and show that “Mathematics is 


the science of structure". 
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Lecture 2: Arithmetic 


The oldest mathematical discipline is arithmetic. It is the theory of the construction and manipulation of 
numbers. The earliest steps were done by Babylonian, Egyptian, Chinese, Indian and Greek thinkers. 
Building up the number system starts with the natural numbers 1, 2,3, 4... which can be added and multiplied. 
Addition is natural: join 3 sticks to 5 sticks to get 8 sticks. Multiplication * is more subtle: 3x4 means to take 3 
copies of 4 and get 4+4+4 = 12 while 4*3 means to take 4 copies of 3 to get 3+3+3+3 = 12. The first factor 
counts the number of operations while the second factor counts the objects. To motivate 3 * 4 = 4 * 3, spacial 
insight motivates to arrange the 12 objects in a rectangle. This commutativity axiom will be carried over to 
larger number systems. Realizing an addition and multiplicative structure on the natural numbers requires to 
define 0 and 1. It leads naturally to more general numbers. There are two major motivations to to build new 
numbers: we want to 


1. invert operations and still get results. 2. solve equations. 


To find an additive inverse of 3 means solving « + 3 = 0. The answer is a negative number. To solve x * 3 = 1, 
we get to a rational number z = 1/3. To solve x? = 2 one need to escape to real numbers. To solve 2? = —2 


requires complex numbers. 


Operation to complete Examples of equations to solve 


Natural numbers addition and multiplication | 5+ a =9 
Positive fractions addition and division ie — tc) 


Integers subtraction 5+2=3 
Rational numbers _ | division ee 


Algebraic numbers | taking positive roots gi=2,Ie+a7-2 =2 
Real numbers taking limits x =1-—1/34+1/5—-+4...,cos(x) = a 
Complex numbers | take any roots 


Surreal numbers transfinite limits (0 10, 


Surreal complex any operation x+1l=-—w 


The development and history of arithmetic can be summarized as follows: humans started with natural numbers, 
dealt with positive fractions, reluctantly introduced negative numbers and zero to get the integers, struggled to 
“realize" real numbers, were scared to introduce complex numbers, hardly accepted surreal numbers and most 
do not even know about surreal complex numbers. Ironically, as simple but impossibly difficult questions in 
number theory show, the modern point of view is the opposite to Kronecker’s "God made the integers; all 
else is the work of man": 

The surreal complex numbers are the most natural numbers; 

The natural numbers are the most complex, surreal numbers. 


Natural numbers. Counting can be realized by sticks, bones, quipu knots, pebbles or wampum knots. The 
tally stick concept is still used when playing card games: where bundles of fives are formed, maybe by crossing 
4 "sticks" with a fifth. There is a "log counting" method in which graphs are used and vertices and edges count. 
An old stone age tally stick, the wolf radius bone contains 55 notches, with 5 groups of 5. It is probably more 
than 30’000 years old. [616] The most famous paleolithic tally stick is the Ishango bone, the fibula of a baboon. 
It could be 20’000 - 30000 years old. Earlier counting could have been done by assembling pebbles, 
tying knots in a string, making scratches in dirt or bark but no such traces have survived the thousands of 
years. The Roman system improved the tally stick concept by introducing new symbols for larger numbers 
like V =5,X = 10,L = 40,C = 100, D = 500, M = 1000. in order to avoid bundling too many single sticks. 
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The system is unfit for computations as simple calculations VITJ + VII = XV show. Clay tablets, some as 
early as 2000 BC and others from 600 - 300 BC are known. They feature Akkadian arithmetic using the base 
60. The hexadecimal system with base 60 is convenient because of many factors. It survived: we use 60 minutes 
per hour. The Egyptians used the base 10. The most important source on Egyptian mathematics is the 
Rhind Papyrus of 1650 BC. It was found in 1858 [3874] [616]. Hieratic numerals were used to write on papyrus 
from 2500 BC on. Egyptian numerals are hieroglyphics. Found in carvings on tombs and monuments they 
are 5000 years old. The modern way to write numbers like 2018 is the Hindu-Arab system which diffused 
to the West only during the late Middle ages. It replaced the more primitive Roman system. |616] Greek 
arithmetic used a number system with no place values: 9 Greek letters for 1,2,...9, nine for 10, 20,...,90 and 
nine for 100, 200,..., 900. 


Integers. Indian Mathematics morphed the place-value system into a modern method of writing numbers. 
Hindu astronomers used words to represent digits, but the numbers would be written in the opposite order. 
Independently, also the Mayans developed the concept of 0 in a number system using base 20. Sometimes 
after 500, the Hindus changed to a digital notation which included the symbol 0. Negative numbers were 
introduced around 100 BC in the Chinese text "Nine Chapters on the Mathematical art". Also the Bakshali 
manuscript, written around 300 AD subtracts numbers carried out additions with negative numbers, where + 
was used to indicate a negative sign. [542] In Europe, negative numbers were avoided until the 15’th century. 


Fractions: Babylonians could handle fractions. The Egyptians also used fractions, but wrote every frac- 
tion a as a sum of fractions with unit numerator and distinct denominators, like 4/5 = 1/2 +1/4+ 1/20 or 
5/6 = 1/2+1/3. Maybe because of such cumbersome computation techniques, Egyptian mathematics failed to 
progress beyond a primitive stage. [616]. The modern decimal fractions used nowadays for numerical calcula- 
tions were adopted only in 1595 in Europe. 


Real numbers: As noted by the Greeks already, the diagonal of the square is not a fraction. It first produced a 
crisis until it became clear that "most" numbers are not rational. Georg Cantor saw first that the cardinality 
of all real numbers is much larger than the cardinality of the integers: while one can count all rational numbers 
but not enumerate all real numbers. One consequence is that most real numbers are transcendental: they do 
not occur as solutions of polynomial equations with integer coefficients. The number 7 is an example. The 
concept of real numbers is related to the concept of limit. Sums like 1+ 1/44 1/9+1/1641/25+... are 
not rational. 

Complex numbers: some polynomials have no real root. To solve 2? = —1 for example, we need new 
numbers. One idea is to use pairs of numbers (a,b) where (a,0) = a are the usual numbers and extend addition 
and multiplication (a,b) + (c,d) = (a+c,b+ 4d) and (a,b) - (c,d) = (ac — bd, ad + bc). With this multiplication, 
the number (0,1) has the property that (0,1)- (0,1) = (—1,0) = —1. It is more convenient to write a+ib where 
i = (0,1) satisfies i? = —1. One can now use the common rules of addition and multiplication. 

Surreal numbers: Similarly as real numbers fill in the gaps between the integers, the surreal numbers fill in the 
gaps between Cantors ordinal numbers. They are written as (a,b, c,.../d,e, f,...) meaning that the "simplest" 
number is larger than a,b,c... and smaller than d,e, f,.... We have (|) = 0,(0|) = 1,(1]) = 2 and (0|1) = 1/2 
or (\0) = —1l. Surreals contain already transfinite numbers like (0,1,2,3...|) or infinitesimal numbers like 
(0|1/2, 1/3,1/4,1/5,...). They were introduced in the 1970’ies by John Conway. The late appearance confirms 
the pedagogical principle: late human discovery manifests in increased difficulty to teach it. 
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Lecture 3: Geometry 


Geometry is the science of shape, size and symmetry. While arithmetic deals with numerical structures, 
geometry handles metric structures. Geometry is one of the oldest mathematical disciplines. Early geometry 
has relations with arithmetic: the multiplication of two numbers n x m as an area of a shape that is invariant 
under rotational symmetry. Identities like the Pythagorean triples 3? + 4? = 5? were interpreted and 
drawn geometrically. The right angle is the most "symmetric" angle apart from 0. Symmetry manifests 
itself in quantities which are invariant. Invariants are one the most central aspects of geometry. Felix Klein’s 
Erlangen program uses symmetry to classify geometries depending on how large the symmetries of the shapes 
are. In this lecture, we look at a few results which can all be stated in terms of invariants. In the presentation 
as well as the worksheet part of this lecture, we will work us through smaller miracles like special points in 
triangles as well as a couple of gems: Pythagoras, Thales,Hippocrates, Feuerbach, Pappus, Morley, 
Butterfly which illustrate the importance of symmetry. 


Much of geometry is based on our ability to measure length, the distance between two points. Having a 
distance d(A, B) between any two points A, B, we can look at the next more complicated object, which is a set 
A, B,C of 3 points, a triangle. Given an arbitrary triangle ABC, are there relations between the 3 possible 
distances a = d(B,C),b = d(A,C),c = d(A, B)? If we fix the scale by c= 1, thena+b>1,a+1>6,b4+1>a. 
For any pair of (a,b) in this region, there is a triangle. After an identification, we get an abstract space, which 
represent all triangles uniquely up to similarity. Mathematicians call this an example of a moduli space. 


A sphere S;,.(x) is the set of points which have distance r from a given point «. In the plane, the sphere is called 
acircle. A natural problem is to find the circumference L = 27 of a unit circle, or the area A = z of a unit disc, 
the area F’ = 47 of a unit sphere and the volume V = 4 = 7/3 of a unit sphere. Measuring the length of segments 
on the circle leads to new concepts like angle or curvature. Because the circumference of the unit circle in the 
plane is L = 27, angle questions are tied to the number 7, which Archimedes already approximated by fractions. 


Also volumes were among the first quantities, Mathematicians wanted to measure and compute. A problem 
on Moscow papyrus dating back to 1850 BC explains the general formula h(a? + ab + b?)/3 for a truncated 
pyramid with base length a, roof length b and height h. Archimedes achieved to compute the volume of the 
sphere: place a cone inside a cylinder. The complement of the cone inside the cylinder has on each height h 
the area 7 — th?. The half sphere cut at height h is a disc of radius (1 — h?) which has area (1 — h?) too. 
Since the slices at each height have the same area, the volume must be the same. The complement of the cone 
inside the cylinder has volume 7 — 7/3 = 27/3, half the volume of the sphere. 


The first geometric playground was planimetry, the geometry in the flat two dimensional space. Highlights 
are Pythagoras theorem, Thales theorem, Hippocrates theorem, and Pappus theorem. Discoveries 
in planimetry have been made later on: an example is the Feuerbach 9 point theorem from the 19th century. 
Ancient Greek Mathematics is closely related to history. It starts with Thales goes over Euclid’s era at 500 
BC and ends with the threefold destruction of Alexandria 47 BC by the Romans, 392 by the Christians and 
640 by the Muslims. Geometry was also a place, where the axiomatic method was brought to mathematics: 
theorems are proved from a few statements which are called axioms like the 5 axioms of Euclid: 
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. Any two distinct points A, B determines a line through A and B. 
. A line segment [A, B] can be extended to a straight line containing the segment. 


1 
2 
3. A line segment [A, B] determines a circle containing B and center A. 
4. All right angles are congruent. 

5 


. If lines L, M intersect with a third so that inner angles add up to < 7, then L, M intersect. 


Euclid wondered whether the fifth postulate can be derived from the first four and called theorems derived 
from the first four the "absolute geometry". Only much later, with Karl-Friedrich Gauss and Janos Bolyai 
and Nicolai Lobachevsky in the 19’th century in hyperbolic space the 5’th axiom does not hold. Indeed, 
geometry can be generalized to non-flat, or even much more abstract situations. Basic examples are geometry 
on a sphere leading to spherical geometry or geometry on the Poincare disc, a hyperbolic space. Both 
of these geometries are non-Euclidean. Riemannian geometry, which is essential for general relativity 
theory generalizes both concepts to a great extent. An example is the geometry on an arbitrary surface. Cur- 
vatures of such spaces can be computed by measuring length alone, which is how long light needs to go from 
one point to the next. 


An important moment in mathematics was the merge of geometry with algebra: this giant step is often 
attributed to René Descartes. Together with algebra, the subject leads to algebraic geometry which can 
be tackled with computers: here are some examples of geometries which are determined from the amount of 
symmetry which is allowed: 


Euclidean geometry Properties invariant under a group of rotations and translations 
Affine geometry Properties invariant under a group of affine transformations 
Projective geometry Properties invariant under a group of projective transformations 
Spherical geometry Properties invariant under a group of rotations 

Conformal geometry Properties invariant under angle preserving transformations 
Hyperbolic geometry Properties invariant under a group of Mobius transformations 


Here are four pictures about the 4 special points in a triangle and with which we will begin the lecture. We will 
see why in each of these cases, the 3 lines intersect in a common point. It is a manifestation of a symmetry 
present on the space of all triangles. size of the distance of intersection points is constant 0 if we move on the 
space of all triangular shapes. It’s Geometry! 
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Lecture 4: Number Theory 


Number theory studies the structure of integers like prime numbers and solutions to Diophantine equations. 
Gauss called it the "Queen of Mathematics". Here are a few theorems and open problems. 

An integer larger than 1 which is divisible by 1 and itself only is called a prime number. The number 
257885161 _ 7 is the largest known prime number. It has 17425170 digits. Euclid proved that there are infinitely 
many primes: [Proof. Assume there are only finitely many primes p; < po <--: < pn. Then n = pip2:+-pyptl 
is not divisible by any p,,...,Pn. Therefore, it is a prime or divisible by a prime larger than p,,.] Primes become 
more sparse as larger as they get. An important result is the prime number theorem which states that the n’th 
prime number has approximately the size nlog(n). For example the n = 10!*’th prime is p(n) = 29996224275833 
and nlog(n) = 27631021115928.545... and p(n)/(nlog(n)) = 1.0856... Many questions about prime numbers 
are unsettled: Here are four problems: the third uses the notation (Aa), = |a@n41 — a,| to get the absolute 
difference. For example: A?(1,4,9,16, 25...) = A(3,5,7,9,11,...) = (2,2,2,2,...). Progress on prime gaps has 
been done in 2013: pni1— pn is smaller than 100’000’000 eventually (Yitang Zhang). py+1 — pn is smaller than 
600 eventually (Maynard). The largest known gap is 1476 which occurs after p = 1425172824437699411. 


Landau there are infinitely many primes of the form n? + 1. 


Twin prime | there are infinitely many primes p such that p+ 2 is prime. 
Goldbach every even integer n > 2 is a sum of two primes. 
Gilbreath If pp, enumerates the primes, then (A*p); = 1 for all k > 0. 
Andrica The prime gap estimate \/Pn+1 — \/Pn < 1 holds for all n. 


If the sum of the proper divisors of a n is equal to n, then n is called a perfect number. For example, 
6 is perfect as its proper divisors 1,2,3 sum up to 6. All currently known perfect numbers are even. The 
question whether odd perfect numbers exist is probably the oldest open problem in mathematics and not 
settled. Perfect numbers were familiar to Pythagoras and his followers already. Calendar coincidences like that 
we have 6 work days and the moon needs "perfect" 28 days to circle the earth could have helped to promote 
the "mystery" of perfect number. Euclid of Alexandria (300-275 BC) was the first to realize that if 2? — 1 
is prime then k = 2?~1(2? — 1) is a perfect number: [Proof: let a(n) be the sum of all factors of n, including 
n. Now o(2” — 1)2"71) = o(2” — 1)0(2"-1) = 27(2" — 1) = 2- 27(2" — 1) shows o(k) = 2k and verifies 
that k is perfect.] Around 100 AD, Nicomachus of Gerasa (60-120) classified in his work "Introduction to 
Arithmetic" numbers on the concept of perfect numbers and lists four perfect numbers. Only much later it 
became clear that Euclid got all the even perfect numbers: Euler showed that all even perfect numbers are of 
the form (2” — 1)2”~!, where 2” — 1 is prime. The factor 2” — 1 is called a Mersenne prime. [Proof: Assume 
N = 2*m is perfect where m is odd and k > 0. Then 2*+1m = 2N = o(N) = (2*+1 — 1)a(m). This gives 
a(m) = 28+1m/(2**1 — 1) = m(1 + 1/(2**1 — 1)) = m+ m/(2**1 — 1). Because o(m) and m are integers, 


also m/(2**+! — 1) is an integer. It must also be a factor of m. The only way that o(m) can be the sum of 


only two of its factors is that m is prime and so 2'+! —1=m.] The first 39 known Mersenne primes are 
of the form 2” — 1 with n = 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607, 1279, 2203, 2281, 3217, 4253, 
4423, 9689, 9941, 11213, 19937, 21701, 23209, 44497, 86243, 110503, 132049, 216091, 756839, 859433, 1257787, 
1398269, 2976221, 3021377, 6972593, 13466917. There are 11 more known from which one does not know the 
rank of the corresponding Mersenne prime: n = 20996011, 24036583, 25964951, 30402457, 32582657, 37156667, 
42643801,43112609,57885161, 74207281,77232917. The last was found in December 2017 only. It is unknown 
whether there are infinitely many. 
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A polynomial equations for which all coefficients and variables are integers is called a Diophantine equation. 
The first Diophantine equation studied already by Babylonians is 2? + y? = z?. A solution (x,y,z) of this 
equation in positive integers is called a Pythagorean triple. For example, (3,4,5) is a Pythagorean triple. 
Since 1600 BC, it is known that all solutions to this equation are of the form (2, y, z) = (2st, s? — t?, s? +t?) or 
(x,y,z) = (s? — t?, 2st, s? + t?), where s,¢ are different integers. [Proof. Either x or y has to be even because 
if both are odd, then the sum x? + y? is even but not divisible by 4 but the right hand side is either odd or 
divisible by 4. Move the even one, say 2? to the left and write x? = z? — y? = (z — y)(z+y), then the right 
hand side contains a factor 4 and is of the form 4s7¢?. 
2=84+ y=s? -#, 2 = 2st] 


Analyzing Diophantine equations can be difficult. Only 10 years ago, one has established that the Fermat 


Therefore 2s? = z— y, 2t2 = z+y. Solving for z,y gives 


equation x” +y" = z” has no solutions with ryz 4 0 ifn > 2. Here are some open problems for Diophantine 
equations. Are there nontrivial solutions to the following Diophantine equations? 


a8 + yo + 28+ u8 + 7% = wy? | 2, y,z,u,0,w > 0 


P+yPtea=w £,Y,z,w >0 


gk +a” = niz® ik Qi = A 
g® + y? = Oh ye > 2) gcd(a, b,c) =1 


The last equation is called Super Fermat. A Texan banker Andrew Beals once sponsored a prize of 100’000 
dollars for a proof or counter example to the statement: "If z? + y? = z” with p,q,r > 2, then gced(z, y, z) > 1." 
Given a prime like 7 and a number n we can add or subtract multiples of 7 from n to get a number in 
{0,1,2,3,4,5,6}. We write for example 19 = 12 mod 7 because 12 and 19 both leave the rest 5 when dividing 
by 7. Or 5 * 6 = 2 mod 7 because 30 leaves the rest 2 when dividing by 7. The most important theorem in 
elementary number theory is Fermat’s little theorem which tells that if a is an integer and p is prime then 
a? — a is divisible by p. For example 2’ — 2 = 126 is divisible by 7. [Proof: use induction. For a = 0 it is clear. 
The binomial expansion shows that (a+1)?—a?—1 is divisible by p. This means (a+1)?—(a+1) = (a?—a)+mp 
for some m. By induction, a? — a is divisible by p and so (a+ 1)? — (a+ 1).] An other beautiful theorem is 
Wilson’s theorem which allows to characterize primes: It tells that (n — 1)! + 1 is divisible by n if and only 
if nm is a prime number. For example, for n = 5, we verify that 4! + 1 = 25 is divisible by 5. [Proof: assume 
n is prime. There are then exactly two numbers 1,—1 for which x? — 1 is divisible by n. The other numbers 
in 1,...,n—1 can be paired as (a,b) with ab = 1. Rearranging the product shows (n — 1)! = —1 modulo n. 
Conversely, if n is not prime, then n = km with k,m <n and (n— 1)! =...km is divisible by n = km. | 

The solution to systems of linear equations like x = 3 (mod 5),2 = 2 (mod 7) is given by the Chinese 
remainder theorem. To solve it, continue adding 5 to 3 until we reach a number which leaves rest 2 to 7: 
on the list 3, 8,13, 18, 23, 28, 33, 38, the number 23 is the solution. Since 5 and 7 have no common divisor, the 
system of linear equations has a solution. 

For a given n, how do we solve x? — yn = 1 for the unknowns y,? A solution produces a square root x of 1 
modulo n. For prime n, only « = 1,2 = —1 are the solutions. For composite n = pq, more solutions x = r- s 
where r? = —1 mod p and s? = —1 mod q appear. Finding z is equivalent to factor n, because the greatest 
common divisor of x? — 1 and n is a factor of n. Factoring is difficult if the numbers are large. It assures 
that encryption algorithms work and that bank accounts and communications stay safe. Number theory, 
once the least applied discipline of mathematics has become one of the most applied one in mathematics. 
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Lecture 5: Algebra 


Algebra studies algebraic structures like "groups" and "rings". The theory allows to solve polynomial 
equations, characterize objects by its symmetries and is the heart and soul of many puzzles. Lagrange claims 
Diophantus to be the inventor of Algebra, others argue that the subject started with solutions of quadratic 
equation by Mohammed ben Musa Al-Khwarizmi in the book Al-jabr w’al muqabala of 830 AD. Solutions 
to equation like x? + 102 = 39 are solved there by completing the squares: add 25 on both sides go get 
x? + 10x + 25 = 64 and so (x + 5) = 8 so that x = 3. 

The use of variables introduced in school in elementary algebra were introduced later. Ancient texts only 
dealt with particular examples and calculations were done with concrete numbers in the realm of arithmetic. 
Francois Viete (1540-1603) used first letters like A,B,C, X for variables. 


The search for formulas for polynomial equations of degree 3 and 4 lasted 700 years. In the 16’th century, 
the cubic equation and quartic equations were solved. Niccolo Tartaglia and Gerolamo Cardano reduced 
the cubic to the quadratic: [first remove the quadratic part with X = x —a/3 so that X3+aX?7+bX +c 
becomes the depressed cubic x° + px + qg. Now substitute x = u — p/(3u) to get a quadratic equation 
(u® + qu? — p?/27)/u? = 0 for u?.| Lodovico Ferrari shows that the quartic equation can be reduced to the 
cubic. For the quintic however no formulas could be found. It was Paolo Ruffini, Niels Abel and Evariste 
Galois who independently realized that there are no formulas in terms of roots which allow to "solve" equations 
p(«) = 0 for polynomials p of degree larger than 4. This was an amazing achievement and the birth of "group 
theory". 


Two important algebraic structures are groups and rings. 


In a group G one has an operation *, an inverse a~! and a one-element 1 such that a*(b*c) = (a*b)*c,a*1= 


7 ‘xa =1. For example, the set Q* of nonzero fractions p/q with multiplication operation * 


il 


lxa=a,axa > =a 


and inverse 1/a form a group. The integers with addition and inverse a~* = —a and "1"-element 0 form a group 
too. A ring R has two compositions + and x, where the plus operation is a group satisfying a+b = b+a in which 
the one element is called 0. The multiplication operation * has all group properties on R* except the existence 
of an inverse. The two operations + and * are glued together by the distributive law ax (b+c) =axb+axc. 
An example of a ring are the integers or the rational numbers or the real numbers. The later two are 
actually fields, rings for which the multiplication on nonzero elements is a group too. The ring of integers are 


no field because an integer like 5 has no multiplicative inverse. The ring of rational numbers however form a field. 


Why is the theory of groups and rings not part of arithmetic? First of all, a crucial ingredient of algebra is 
the appearance of variables and computations with these algebras without using concrete numbers. Second, 
the algebraic structures are not restricted to "numbers". Groups and rings are general structures and extend 
for example to objects like the set of all possible symmetries of a geometric object. The set of all similarity 
operations on the plane for example form a group. An important example of a ring is the polynomial ring 
of all polynomials. Given any ring R and a variable x, the set R[x] consists of all polynomials with coefficients 


in R. The addition and multiplication is done like in (x? + 3x + 1) + (a — 7) = 2? + 4a —7. The problem to 
2 


factor a given polynomial with integer coefficients into polynomials of smaller degree: «* — x + 2 for example 
can be written as (x + 1)(a# — 2) have a number theoretical flavor. Because symmetries of some structure form 
a group, we also have intimate connections with geometry. But this is not the only connection with geometry. 
Geometry also enters through the polynomial rings with several variables. Solutions to f(x,y) = 0 leads to 
geometric objects with shape and symmetry which sometimes even have their own algebraic structure. They 


are called varieties, a central object in algebraic geometry, objects which in turn have been generalized 
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Arithmetic introduces addition and multiplication of numbers. Both form a group. The operations can be 
written additively or multiplicatively. Lets look at this a bit closer: for integers, fractions and reals and the 
addition +, the 1 element 0 and inverse —g, we have a group. Many groups are written multiplicatively where 
the 1 element is 1. In the case of fractions or reals, 0 is not part of the multiplicative group because it is not 
possible to divide by 0. The nonzero fractions or the nonzero reals form a group. In all these examples the 
groups satisfy the commutative law g*h=h xg. 

Here is a group which is not commutative: let G be the set of all rotations in space, which leave the unit 
cube invariant. There are 3*3=9 rotations around each major coordinate axes, then 6 rotations around axes 
connecting midpoints of opposite edges, then 2*4 rotations around diagonals. Together with the identity rotation 
e, these are 24 rotations. The group operation is the composition of these transformations. 

An other example of a group is $4, the set of all permutations of four numbers (1,2,3,4). If g : (1,2,3,4) > 
(2,3,4,1) is a permutation and h : (1,2,3,4) > (3,1,2,4) is an other permutation, then we can combine the 
two and define h * g as the permutation which does first g and then h. We end up with the permutation 
(1,2,3,4) > (1,2,4,3). The rotational symmetry group of the cube happens to be the same than the group 
S4. To see this "isomorphism", label the 4 space diagonals in the cube by 1,2,3,4. Given a rotation, we can 
look at the induced permutation of the diagonals and every rotation corresponds to exactly one permutation. 
The symmetry group can be introduced for any geometric object. For shapes like the triangle, the cube, the 
octahedron or tilings in the plane. 


Symmetry groups describe geometric shapes by algebra. 


Many puzzles are groups. A popular puzzle, the 15-puzzle was invented in 1874 by Noyes Palmer Chapman 
in the state of New York. If the hole is given the number 0, then the task of the puzzle is to order a given 
random start permutation of the 16 pieces. To do so, the user is allowed to transposes 0 with a neighboring 
piece. Since every step changes the signature s of the permutation and changes the taxi-metric distance d of 0 
to the end position by 1, only situations with even s + d can be reached. It was Sam Loyd who suggested to 
start with an impossible solution and as an evil plot to offer 1000 dollars for a solution. The 15 puzzle group 
has 16!/2 elements and the "god number" is between 152 and 208. The Rubik cube is an other famous puzzle, 
which is a group. Exactly 100 years after the invention of the 15 puzzle, the Rubik puzzle was introduced in 
1974. Its still popular and the world record is to have it solved in 5.55 seconds. All Cubes 2x2x2 to 7x7x7 in a 
row have been solved in a total time of 6 minutes. For the 3x3x3 cube, the God number is now known to be 


20: one can always solve it in 20 or less moves. 


Many puzzles are groups. 


A small Rubik type game is the "floppy", which is a third of the Rubik and which has only 192 elements. An 
other example is the Meffert’s great challenge. Probably the simplest example of a Rubik type puzzle is 
the pyramorphix. It is a puzzle based on the tetrahedron. Its group has only 24 elements. It is the group 
of all possible permutations of the 4 elements. It is the same group as the group of all reflection and rotation 
symmetries of the cube in three dimensions and also is relevant when understanding the solutions to the quartic 
equation discussed at the beginning. The circle is closed. 


159 


FUNDAMENTAL THEOREMS 
E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018 


Lecture 6: Calculus 


Calculus generalizes the process of taking differences and taking sums. Differences measure change, sums 
explore how quantities accumulate. The procedure of taking differences has a limit called derivative. The 
activity of taking sums leads to the integral. Sum and difference are dual to each other and related in an 
intimate way. In this lecture, we look first at a simple set-up, where functions are evaluated on integers and 
where we do not take any limits. 

Several dozen thousand years ago, numbers were represented by units like 1,1,1,1,1,1,.... The units were 
carved into sticks or bones like the Ishango bone It took thousands of years until numbers were represented 
with symbols like 0,1,2,3,4,.... Using the modern concept of function, we can say f(0) = 0, f(1) = 1, f(2) = 
2, f(3) = 3 and mean that the function f assigns to an input like 1001 an output like f(1001) = 1001. Now 
look at Df(n) = f(n+1) — f(n), the difference. We see that Df(n) = 1 for all n. We can also formalize the 
summation process. If g(n) = 1 is the constant 1 function, then then Sg(n) = g(0) + g(1) +--+ 9(n-1) = 
14+1+---+1=n. We see that Df = g and Sg = f. If we start with f(n) = n and apply summation on 
that function Then Sf(n) = f(0) + f(1) + f(2) +---+ f(m — 1) leading to the values 0,1,3,6,10,15,21,.... 
The new function g = Sf satisfies g(1) = 1,9(2) = 3,g(2) = 6, etc. The values are called the triangular 
numbers. From g we can get back f by taking difference: Dg(n) = g(n+1)— g(n) = f(n). For example 
Dg(5) = g(6) — g(5) = 15 — 10 = 5 which indeed is f(5). Finding a formula for the sum Sf(n) is not so easy. 
Can you do it? When Karl]-Friedrich Gauss was a 9 year old school kid, his teacher, a Mr. Btittner gave him 
the task to sum up the first 100 numbers 1+ 2+---+ 100. Gauss found the answer immediately by pairing 
things up: to add up 1+ 2+3+---+ 100 he would write this as (1 + 100) + (2+99)+---+ (50+ 51) leading 
to 50 terms of 101 to get for n = 101 the value g(n) = n(n — 1)/2 = 5050. Taking differences again is easier 
Dg(n) = n(n + 1)/2— n(n — 1)/2 = n= f(n). If we add up he triangular numbers we compute h = Sg which 
has the first values 0,1, 4,10, 20,35, ..... These are the tetrahedral numbers because h(n) balls are needed 
to build a tetrahedron of side length n. For example, h(4) = 20 golf balls are needed to build a tetrahedron of 


side length 4. The formula which holds for h is | h(n) = n(n — 1)(n — 2)/6|. Here is the fundamental theorem 


of calculus, which is the core of calculus: 


Df(n) = f(n) — (0), DSf(n) = f(n) . 
Proof. 


n—1 


SDf(n) = [f(k +1) — f&)] = fm) - fO), 
k=0 


DSF(n) = [S> f(b +1) — So FH) = Flo) 
k=0 


k=0 
The process of adding up numbers will lead to the integral i f(x) dx |. The process of taking differences will 


lead to the derivative [FoI 


The familiar notation is 
Jo &S(t) dt = f(x) — F(0), = fo f(t) dt = f(a) 


If we define [n]° = 1, [n]* = n, [n]? = n(n —1)/2, [n]? = n(n —1)(n— 2)/6 then D[n] = [1], D[n]? = 2[n], D[n]? = 
3[n]? and in general 


The calculus you have just seen, contains the essence of single variable calculus. This core idea will become 
more powerful and natural if we use it together with the concept of limit. 
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Problem: The Fibonnacci sequence 1,1, 2,3,5,8,13,21,... satisfies the rule f(a) = f(a —1)+ f(a — 2). For 
example, f(6) = 8. What is the function g = Df, if we assume f(0) = 0? We take the difference between 
successive numbers and get the sequence of numbers 0,1,1,2,3,5,8,... which is the same sequence again. We 
If we take the same function f but now but now compute the function h(n) = Sf(n), we get the sequence 
1, 2,4, 7, 12, 20, 33,.... What sequence is that? Solution: Because Df(x) = f(a — 1) we have f(a) — f(0) = 
SDf(x) = Sf(a —1) so that Sf(a) = f(a +1) -— f(1). Summing the Fibonnacci sequence produces the 
Fibonnacci sequence shifted to the left with f(2) = 1 is subtracted. It has been relatively easy to find the 
sum, because we knew what the difference operation did. This example shows: we can study differences to 
understand sums. 

Problem: The function f(n) = 2” is called the exponential function. We have for example f(0) = 1, f(1) = 
2, f(2) =4,.... It leads to the sequence of numbers 


nm 01234 5 67 8 
f(n\= 1 2 4 8 16 32 64 128 256 


We can verify that f satisfies the equation | Df(x) = f(x) |}, because Df (x) = 27+! — 2” = (2 — 1)27 = 2”. 
This is an important special case of the fact that 


The derivative of the exponential function is the exponential function itself. 


The function 2* is a special case of the exponential function when the Planck constant is equal to 1. We will see 
that the relation will hold for any h > 0 and also in the limit h > 0, where it becomes the classical exponential 
function e” which plays an important role in science. 

Calculus has many applications: computing areas, volumes, solving differential equations. It even has applica- 
tions in arithmetic. Here is an example for illustration. It is a proof that 7 is irrational The theorem is due 
to Johann Heinrich Lambert (1728-1777): We show here the proof by Ivan Niven is given in a book of Niven- 
Zuckerman-Montgomery. It originally appeared in 1947 (Ivan Niven, Bull.Amer.Math.Soc. 53 (1947),509). The 
proof illustrates how calculus can help to get results in arithmetic. 

Proof. Assume 7 = a/b with positive integers a and b. For any positive integer n define 


f(a) =a2"(a— bax)" /n!. 


We have f(x) = f(a —) and 
O0< f(x) < ra" /n!(*) 


forO<a<v7. For all0 <j <n, the j-th derivative of f is zero at 0 and z and for n <= 7, the j-th derivative 
of f is an integer at 0 and 7. 

The function F(x) = f(a) — f(x) + f(x) — ... + (-1)"f@™ (x) has the property that F(0) and F(z) are 
integers and F'+ F” = f. Therefore, (F’(«) sin(x) — F(a) cos(x))’ = f sin(a). By the fundamental theorem of 
calculus, fj f(a) sin(z) dx is an integer. Inequality (*) implies however that this integral is between 0 and 1 for 
large enough n. For such an n we get a contradiction. 
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Lecture 7: Set Theory and Logic 


Set theory studies sets, the fundamental building blocks of mathematics. While logic describes the language of 
all mathematics, set theory provides the framework for additional structures like category theory. In Cantorian 
set theory, one can compute with subsets of a given set X like with numbers. There are two basic operations: 
the addition A+ B of two sets is defined as the set of all points which are in exactly one of the sets. The 
multiplication A- B of two sets contains all the points which are in both sets. With the symmetric difference 
as addition and the intersection as multiplication, the subsets of a given set X become a ring. This Boolean 
ring has the property A+ A = 0 and A- A = A for all sets. The zero element is the empty set ) = {}. The 
additive inverse of A is the complement —A of A in X. The multiplicative 1-element is the set X because 
X-A=A. As in the ring Z of integers, the addition and multiplication on sets is commutative. Multiplication 
does not have an inverse in general. Two sets A, B have the same cardinality, if there exists a one-to-one map 
from A to B. For finite sets, this means that they have the same number of elements. Sets which do not have 
finitely many elements are called infinite. Do all sets with infinitely many elements have the same cardinality? 
The integers Z and the natural numbers N for example are infinite sets which have the same cardinality: the 
map f(2n) = n, f(2n +1) = —n establishes a bijection between N and Z. Also the rational numbers Q have 
the same cardinality than N. Associate a fraction p/q with a point (p,q) in the plane. Now cut out the column 
q = 0 and run the Ulam spiral on the modified plane. This provides a numbering of the rationals. Sets which 
can be counted are called of cardinality No. Does an interval have the same cardinality than the reals? Even 
so an interval like J = (—7/2,7/2) has finite length, one can bijectively map it to R with the tan function as 
tan : I > Ris bijective. Similarly, one can see that any two intervals of positive length have the same cardinality. 
It was a great moment of mathematics, when Georg Cantor realized in 1874 that the interval (0,1) does not 
have the same cardinality than the natural numbers. His argument is ingenious: assume, we could count the 
points a1,d2,.... If 0.a;,a;2a;3... is the decimal expansion of a;, define the real number b = 0.6;b2b3..., where 
b; = aj; + 1 mod 10. Because this number b does not agree at the first decimal place with a,, at the second 
place with az and so on, the number 6 does not appear in that enumeration of all reals. It has positive distance 
at least 10~* from the i’th number (and any representation of the number by a decimal expansion which is 
equivalent). This is a contradiction. The new cardinality, the continuum is also denoted X;. The reals are 
uncountable. This gives elegant proofs like the existence of transcendental number, numbers which are not 
algebraic, meaning that they are not the root of any polynomial with integer coefficients: algebraic numbers can 
be counted. Similarly as one can establish a bijection between the natural numbers N and the integers Z, there 
is a bijection f between the interval J and the unit square: if « = 0.27, 22273... is the decimal expansion of x then 
f(a) = (0.a14%3%5...,0.%2%4%6...) is the bijection. Are there cardinalities larger than Ni? Cantor answered 
also this question. He showed that for an infinite set, the set of all subsets has a larger cardinality than the set 
itself. How does one see this? Assume there is a bijection x + A(a) which maps each point to a set A(x). Now 
look at the set B = {x | x ¢ A(x) } and let b be the point in X which corresponds to B. If y € B, then y ¢ B(x). 
On the other hand, if y ¢ B, then y € B. The set B does appear in the "enumeration" x > A(z) of all sets. The 
set of all subsets of N has the same cardinality than the continuum: A — )?j.41/ 2) provides a map from P(N) 
to [0,1]. The set of all finite subsets of N however can be counted. The set of all subsets of the real numbers 
has cardinality N2, etc. Is there a cardinality between No and &,? In other words, is there a set which can 
not be counted and which is strictly smaller than the continuum in the sense that one can not find a bijection 
between it and R? This was the first of the 23 problems posed by Hilbert in 1900. The answer is surprising: 
one has a choice. One can accept either the "yes" or the "no" as a new axiom. In both cases, Mathematics 
is still fine. The nonexistence of a cardinality between No and X, is called the continuum hypothesis and 
is usually abbreviated CH. It is independent of the other axioms making up mathematics. This was the work 
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of Kurt Gédel in 1940 and Paul Cohen in 1963. The story of exploring the consistency and completeness 
of axiom systems of all of mathematics is exciting. Euclid axiomatized geometry, Hilbert’s program was more 
ambitious. He aimed at a set of axiom systems for all of mathematics. The challenge to prove Euclid’s 5’th 
postulate is paralleled by the quest to prove the CH. But the later is much more fundamental because it deals 
with all of mathematics and not only with some geometric space. Here are the Zermelo-Frenkel Axioms 
(ZFC) including the Axiom of choice (C) as established by Ernst Zermelo in 1908 and Adolf Fraenkel and 
Thoral Skolem in 1922. 
Extension _ If two sets have the same elements, they are the same. 


Image Given a function and a set, then the image of the function is a set too. 
Pairing For any two sets, there exists a set which contains both sets. 

Property — For any property, there exists a set for which each element has the property. 
Union Given a set of sets, there exists a set which is the union of these sets. 

Power Given a set, there exists the set of all subsets of this set. 

Infinity There exists an infinite set. 

Regularity Every nonempty set has an element which has no intersection with the set. 
Choice Any set of nonempty sets leads to a set which contains an element from each. 


There are other systems like ETCS, which is the elementary theory of the category of sets. In category 
theory, not the sets but the categories are the building blocks. Categories do not form a set in general. It 
elegantly avoids the Russel paradox too. The axiom of choice (C) has a nonconstructive nature which can 
lead to seemingly paradoxical results like the Banach Tarski paradox: one can cut the unit ball into 5 pieces, 
rotate and translate the pieces to assemble two identical balls of the same size than the original ball. Gédel and 
Cohen showed that the axiom of choice is logically independent of the other axioms ZF. Other axioms in ZF 
have been shown to be independent, like the axiom of infinity. A finitist would refute this axiom and work 
without it. It is surprising what one can do with finite sets. The axiom of regularity excludes Russellian 
sets like the set X of all sets which do not contain themselves. The Russell paradox is: Does X contain 
X? It is popularized as the Barber riddle: a barber in a town only shaves the people who do not shave 
themselves. Does the barber shave himself? G6dels theorems of 1931 deal with mathematical theories 


which are strong enough to do basic arithmetic in them. 


First incompleteness theorem: Second incompleteness theorem: 
In any theory there are true statements which can In any theory, the consistency of the theory can not 
not be proved within the theory. be proven within the theory. 


The proof uses an encoding of mathematical sentences which allows to state liar paradoxical statement "this 
sentence can not be proved". While the later is an odd recreational entertainment gag, it is the core for a theorem 
which makes striking statements about mathematics. These theorems are not limitations of mathematics; they 
illustrate its infiniteness. How awful if one could build axiom system and enumerate mechanically all possible 
truths from it. 
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Lecture 8: Probability theory 


Probability theory is the science of chance. It starts with combinatorics and leads to a theory of stochas- 
tic processes. Historically, probability theory initiated from gambling problems as in Girolamo Cardano’s 
gamblers manual in the 16th century. A great moment of mathematics occurred, when Blaise Pascal and 
Pierre Fermat jointly laid a foundation of mathematical probability theory. 

It took a while to formalize “randomness" precisely. Here is the setup as which it had been put forward by 
Andrey Kolmogorov: all possible experiments of a situation are modeled by a set 2, the "laboratory". A 
measurable subset of experiments is called an “event". Measurements are done by real-valued functions X. 
These functions are called random variables and are used to observe the laboratory. 

As an example, let us model the process of throwing a coin 5 times. An experiment is a word like httht, where h 
stands for “head" and t represents “tail". The laboratory consists of all such 32 words. We could look for example 
at the event A that the first two coin tosses are tail. It is the set A = {ttttt, tttth, tttht, ttthh, tthtt, tthth, tthht, tthhh}. 
We could look at the random variable which assigns to a word the number of heads. For every experiment, we 
get a value, like for example, X [tthht] = 2. 

In order to make statements about randomness, the concept of a probability measure is needed. This is 
a function P from the set of all events to the interval [0,1]. It should have the property that P[Q] = 1 and 
P|A, U Ag U-:-] = P[Ai] + P[Ao] +---, if A; is a sequence of disjoint events. 

The most natural probability measure on a finite set Q is P[A] = ||A]|/||Q|], where || Al] stands for the number 
of elements in A. It is the “number of good cases" divided by the “number of all cases". For example, to count 
the probability of the event A that we throw 3 heads during the 5 coin tosses, we have |A| = 10 possibilities. 
Since the entire laboratory has |Q| = 32 possibilities, the probability of the event is 10/32. In order to study 
these probabilities, one needs combinatorics: 


How many ways are there to: The answer is: 
rearrange or permute n elements n! = n(n —1)...2-1 
choose & from n with repetitions n 


pick k from n if order matters Gol 


pick k from n with order irrelevant | | =F =e =) 


weq & (w)P[{w}]. In our coin toss 
experiment, this is 5/2. The variance of X is the expectation of (X — m)?. In our coin experiments, it is 5/4. 


The expectation of a random variable |X] is defined as the sum m = )> 


The square root of the variance is the standard deviation. This is the expected deviation from the mean. An 
event happens almost surely if the event has probability 1. 

An important case of a random variable is X(w) = w on Q = R equipped with probability P[A] = [, we dx, 
the standard normal distribution. Analyzed first by Abraham de Moivre in 1733, it was studied by Carl 
Friedrich Gauss in 1807 and therefore also called Gaussian distribution. 

Two random variables X,Y are called uncorrelated, if E[XY] = E[X]- E[Y]. If for any functions f,g also 
f(X) and g(Y) are uncorrelated, then X,Y are called independent. Two random variables are said to have 
the same distribution, if for any a < b, the events {a < X <b} and {a < Y < b } are independent. If X,Y 
are uncorrelated, then the relation Var[X]+ Var[Y] = Var[|X + Y] holds which is just Pythagoras theorem, 
because uncorrelated can be understood geometrically: X — E[X] and Y — E[Y] are orthogonal. A common 
problem is to study the sum of independent random variables X,, with identical distribution. One abbreviates 
this IID. Here are the three most important theorems which we formulate in the case, where all random variables 
are assumed to have expectatation 0 and standard deviation 1. Let S, = X;+...+ X, be the n’th sum of the 
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IID random variables. It is also called a random walk. 


LLN Law of Large Numbers assures that S,,/n converges to 0. 
CLT Central Limit Theorem:S,,/\/n approaches the Gaussian distribution. 
LIL Law of Iterated Logarithm: S,,/\/2nloglog(n) accumulates in [—1, 1]. 


The LLN shows that one can find out about the expectation by averaging experiments. The CLT explains why 
one sees the standard normal distribution so often. The LIL finally gives us a precise estimate how fast S,, 
grows. Things become interesting if the random variables are no more independent. Generalizing LLN,CLT,LIL 
to such situations is part of ongoing research. 


Here are two open questions in probability theory: 


Are numbers like 7, e, /2 normal: do all digits appear with the same frequency? 
What growth rates A, can occur in S;,/A,, having limsup 1 and liminf —1? 


For the second question, there are examples for A, = 1,An, = log(n) and of course A, = \/nloglog(n) from 
LIL if the random variables are independent. Examples of random variables which are not independent are 


Xn = cos(nv2). 


Statistics is the science of modeling random events in a probabilistic setup. Given data points, we want to 
find a model which fits the data best. This allows to understand the past, predict the future or discover 
laws of nature. The most common task is to find the mean and the standard deviation of some data. The 
mean is also called the average and given by m = 4 )7;_, vx. The variance is 0? = +f, (xn — m)? with 
standard deviation o. 


A sequence of random variables X,, define a so called stochastic process. Continuous versions of such pro- 
cesses are where X; is a curve of random random variables. An important example is Brownian motion, 
which is a model of a random particles. 


Besides gambling and analyzing data, also physics was an important motivator to develop probability theory. 
An example is statistical mechanics, where the laws of nature are studied with probabilistic methods. A 
famous physical law is Ludwig Boltzmann’s relation S = klog(W) for entropy, a formula which decorates 
Boltzmann’s tombstone. The entropy of a probability measure P[{k}] = pz on a finite set {1,...,n} is defined 
as S = —)%', p;log(p;). Today, we would reformulate Boltzmann’s law and say that it is the expectation 
S = Ellog(W)| of the logarithm of the “Wahrscheinlichkeit" random variable W(i) = 1/p; on Q = {1,...,n }. 


Entropy is important because nature tries to maximize it 
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Lecture 9: Topology 


Topology studies properties of geometric objects which do not change under continuous reversible deforma- 
tions. In topology, a coffee cup with a single handle is the same as a doughnut. One can deform one into the 
other without punching any holes in it or ripping it apart. Similarly, a plate and a croissant are the same. But 
a croissant is not equivalent to a doughnut. On a doughnut, there are closed curves which can not be pulled 
together to a point. For a topologist the letters O and P are the equivalent but different from the letter B. 
The mathematical setup is beautiful: a topological space is a set X with a set O of subsets of X containing 
both @ and X such that finite intersections and arbitrary unions in O are in O. Sets in O are called open sets 
and O is called a topology. The complement of an open set is called closed. Examples of topologies are the 
trivial topology O = {0, X}, where no open sets besides the empty set and X exist or the discrete topology 
O={A|AcCc X}, where every subset is open. But these are in general not interesting. An important example 
on the plane X is the collection O of sets U in the plane X for which every point is the center of a small disc 
still contained in U. A special class of topological spaces are metric spaces, where a set X is equipped with a 
distance function d(x, y) = d(y, x) > 0 which satisfies the triangle inequality d(x, y) +d(y, z) > d(x, z) and 
for which d(x, y) = 0 if and only if « = y. A set U in a metric space is open if to every x in U, there is a ball 
B,(a) = {y|d(a,y) <r} of positive radius r contained in U. Metric spaces are topological spaces but not vice 
versa: the trivial topology for example is not in general. For doing calculus on a topological space X, each 
point has a neighborhood called chart which is topologically equivalent to a disc in Euclidean space. Finitely 
many neighborhoods covering X form an atlas of X. If the charts are glued together with identification maps 
on the intersection one obtains a manifold. Two dimensional examples are the sphere, the torus, the pro- 
jective plane or the Klein bottle. Topological spaces X,Y are called homeomorphic meaning “topologically 
equivalent" if there is an invertible map from X to Y such that this map induces an invertible map on the 
corresponding topologies. How can one decide whether two spaces are equivalent in this sense? The surface of 
the coffee cup for example is equivalent in this sense to the surface of a doughnut but it is not equivalent to the 
surface of a sphere. Many properties of geometric spaces can be understood by discretizing it like with a graph. 
A graph is a finite collection of vertices V together with a finite set of edges E, where each edge connects two 
points in V. For example, the set V of cities in the US where the edges are pairs of cities connected by a street 
is a graph. The KGnigsberg bridge problem was a trigger puzzle for the study of graph theory. Polyhedra 
were an other start in graph theory. It study is loosely related to the analysis of surfaces. The reason is that 
one can see polyhedra as discrete versions of surfaces. In computer graphics for example, surfaces are rendered 
as finite graphs, using triangularizations. The Euler characteristic of a convex polyhedron is a remarkable 
topological invariant. It is V — E+ F = 2, where V is the number of vertices, E the number of edges and F' the 
number of faces. This number is equal to 2 for connected polyhedra in which every closed loop can be pulled 
together to a point. This formula for the Euler characteristic is also called Euler’s gem. It comes with a rich 
history. René Descartes stumbled upon it and written it down in a secret notebook. It was Leonard Euler 
in 1752 was the first to proved the formula for convex polyhedra. A convex polyhedron is called a Platonic 
solid, if all vertices are on the unit sphere, all edges have the same length and all faces are congruent polygons. 
A theorem of Theaetetus states that there are only five Platonic solids: [Proof: Assume the faces are regular 
n-gons and m of them meet at each vertex. Beside the Euler relation V + E+ F = 2, a polyhedron also satisfies 
the relations nF = 2E and mV = 2E which come from counting vertices or edges in different ways. This gives 
2E/m—E+2E/n = 2 or 1/n+1/m=1/E+1/2. From n > 3 and m > 3 we see that it is impossible that both 
m and n are larger than 3. There are now nly two possibilities: either n = 3 or m = 3. In the case n = 3 we 
have m = 3,4,5 in the case m = 3 we have n = 3,4,5. The five possibilities (3,3), (3,4), (3,5), (4,3), (5,3) 
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represent the five Platonic solids.| The pairs (n,m) are called the Schlafly symbol of the polyhedron: 


Name V E F V-E+F  Schlafli Name V E F  V-E+F  Schilafli 
tetrahedron 4 6 4 2 133) 

hexahedron 8 12 6 2 143 dodecahedron 20 30 12 2 1 Oy 
octahedron 6 12 8 2 1a,4y icosahedron 12 30 20 2 13,07 


The Greeks proceeded geometrically: Euclid showed in the "Elements" that each vertex can have either 3,4 or 5 
equilateral triangles attached, 3 squares or 3 regular pentagons. (6 triangles, 4 squares or 4 pentagons would lead 
to a total angle which is too large because each corner must have at least 3 different edges). Simon Antoine- 
Jean L’Huilier refined in 1813 Euler’s formula to situations with holes: V—E-+F=2-—- 29, 

where g is the number of holes. For a doughnut it is V - + F = 0. Cauchy first proved that there are 4 
non-convex regular Kepler-Poinsot polyhedra. 


Name V E FF V-E+F  Schilafli 
small stellated dodecahedron 12 30 12 -6 157 20 
great dodecahedron 12 30 12 -6 {5, 5/2} 
great stellated dodecahedron 20 30 12 2 {5/2, 3} 
great icosahedron £2) 30) 20" 2 {3,.5/ 2} 


If two different face types are allowed but each vertex still look the same, one obtains 13 semi-regular polyhe- 
dra. They were first studied by Archimedes in 287 BC. Since his work is lost, Johannes Kepler is considered 
the first since antiquity to describe all of them them in his "Harmonices Mundi". The Euler characteristic for 
surfaces is x = 2—2g where g is the number of holes. The computation can be done by triangulating the surface. 
The Euler characteristic characterizes smooth compact surfaces if they are orientable. A non-orientable surface, 
the Klein bottle can be obtained by gluing ends of the Mobius strip. Classifying higher dimensional manifolds 
is more difficult and finding good invariants is part of modern research. Higher analogues of polyhedra are 


called polytopes (Alicia Boole Stott). Regular polytopes are the analogue of the Platonic solids in higher 
dimensions. Examples: 


dimension name Schlafli symbols 

2: Regular polygons Ee Ee ae 

3: Platonic solids 1353} 43, 451310} 544, 97510, 5) 

4: Regular 4D polytopes {3,3,3}, {4,3, 3}, {3,3, 4}, {3, 4, 3}, {5, 3, 3}, {3, 3, 5} 
>5: Regular polytopes 1d pangs nop 4 ayo, Opes gates Ores one 


Ludwig Schllafly saw in 1852 exactly six convex regular convex 4-polytopes or polychora, where "Choros" 
is Greek for "space". Schlaefli’s polyhedral formula is V—-E+F-—-C=0 holds, where C 
is the number of 3-dimensional chambers. In dimensions 5 and higher, there are only 3 types of poly- 
topes: the higher dimensional analogues of the tetrahedron, octahedron and the cube. A general formula 

a7 (—1) Pom =1-(-1)4 gives the Euler characteristic of a convex polytop in d dimensions with 
k-dimensional parts v,z. 
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Lecture 10: Analysis 


Analysis is a science of measure and optimization. As a rather diverse collection of mathematical fields, it con- 
tains real and complex analysis, functional analysis, harmonic analysis and calculus of variations. 
Analysis has relations to calculus, geometry, topology, probability theory and dynamical systems. We focus 
here mostly on "the geometry of fractals" which can be seen as part of dimension theory. Examples are Julia 
sets which belong to the subfield of "complex analysis" of "dynamical systems". "Calculus of variations" is 
illustrated by the Kakeya needle set in "geometric measure theory", "Fourier analysis" appears when looking 
at functions which have fractal graphs, "spectral theory" as part of functional analysis is represented by the 
"Hofstadter butterfly". We somehow describe the topic using "pop icons". 


A fractal is a set with non-integer dimension. An example is the Cantor set, as discovered in 1875 by Henry 
Smith. Start with the unit interval. Cut the middle third, then cut the middle third from both parts then the 
middle parts of the four parts etc. The limiting set is the Cantor set. The mathematical theory of fractals belongs 
to measure theory and can also be thought of a playground for real analysis or topology. The term fractal 
had been introduced by Benoit Mandelbrot in 1975. Dimension can be defined in different ways. The simplest 
is the box counting definition which works for most household fractals: if we need n squares of length r to 
cover a set, then d= —log(n)/log(r) converges to the dimension of the set with r — 0. A curve 
of length L for example needs L/r squares of length r so that its dimension is 1. A region of area A needs A/r? 
squares of length r to be covered and its dimension is 2. The Cantor set needs to be covered with n = 2”” squares 
of length r = 1/3”. Its dimension is — log(n)/log(r) = —m log(2)/(mlog(1/3)) = log(2)/log(3). Examples of 
fractals are the graph of the Weierstrass function 1872, the Koch snowflak (1904), the Sierpinski carpet (1915) 
or the Menger sponge (1926). 

Complex analysis extends calculus to the complex. It deals with functions f(z) defined in the complex plane. 
Integration is done along paths. Complex analysis completes the understanding about functions. It also provides 
more examples of fractals by iterating functions like the quadratic map f(z) = 27 +c: 

One has already iterated functions before like the Newton method (1879). The Julia sets were introduced in 
1918, the Mandelbrot set in 1978 and the Mandelbar set in 1989. Particularly famous are the Douady rabbit 
and the dragon, the dendrite, the airplane. Calculus of variations is calculus in infinite dimensions. 
Taking derivatives is called taking "variations". Historically, it started with the problem to find the curve of 
fastest fall leading to the Brachistochrone curve 7(t) = (¢ — sin(t), 1 — cos(t)). In calculus, we find maxima 


and minima of functions. In calculus of variations, we extremize on much larger spaces. Here are examples of 


problems: 
Brachistochrone 1696 
Minimal surface 1760 
Geodesics 1830 


Isoperimetric problem 1838 

Kakeya Needle problem 1917 
Fourier theory decomposes a function into basic components of various frequencies f(a) = a, sin(a) + 
a2 sin(27) + ag sin(3”) +---. The numbers a; are called the Fourier coefficients. Our ear does such a 
decomposition, when we listen to music. By distinguish different frequencies, our ear produces a Fourier anal- 
ysis. 
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Fourier series 1729 
Fourier transform (FT) 1811 
Discrete FT Gauss? 
Wavelet transform 1930 


The Weierstrass function mentioned above is given as a series }>,, a” cos(7b" x) with 0 <a < 1l,ab>1+ 37/2. 
The dimension of its graph is believed to be 2 + log(a)/log(b) but no rigorous computation of the dimension 
was done yet. Spectral theory analyzes linear maps L. The spectrum are the real numbers F such that 
L — EF is not invertible. A Hollywood celebrity among all linear maps is the almost Matthieu operator 
L(@)n = @n41 + Ln-1 + (2 — 2cos(cn))a,: if we draw the spectrum for for each c, we see the Hofstadter 
butterfly. For fixed c the map describes the behavior of an electron in an almost periodic crystal. An 
other famous system is the quantum harmonic oscillator, L(f) = f(x) + f(x), the vibrating drum 
L(f) = fer + fyy, where f is the amplitude of the drum and f = 0 on the boundary of the drum. 


Hydrogen atom 1914 

Hofstadter butterfly 1976 

Harmonic oscillator 1900 

Vibrating drum 1680 
All these examples in analysis look unrelated at first. Fractal geometry ties many of them together: spectra are 
often fractals, minimal configurations have fractal nature, like in solid state physics or in diffusion limited 
aggregation or in other critical phenomena like percolation phenomena, cracks in solids or the formation 
of lighting bolts In Hamiltonian mechanics, minimal energy configurations are often fractals like Mather 
theory. And solutions to minimizing problems lead to fractals in a natural way like when you have the task to 
turn around a needle on a table by 180 degrees and minimize the area swept out by the needle. The minimal 
turn leads to a Kakaya set, which is a fractal. Finally, lets mention some unsolved problems in analysis: does the 
Riemann zeta function f(z) = (7, 1/n* have all nontrivial roots on the axis Re(z) = 1/2? This question 
is called the Riemann hypothesis and is the most important open problem in mathematics. It is an example 
of a question in analytic number theory which also illustrates how analysis has entered into number theory. 
Some mathematicians think that spectral theory might solve it. Also the Mandelbrot set M is not understood 
yet: the "holy grail" in the field of complex dynamics is the problem whether it M is locally connected. From 
the Hofstadter butterfly one knows that it has measure zero. What is its dimension? An other open question 
in spectral theory is the "can one hear the sound of a drum" problem which asks whether there are two convex 
drums which are not congruent but which have the same spectrum. In the area of calculus of variations, just one 
problem: how long is the shortest curve in space such that its convex hull (the union of all possible connections 
between two points on the curve) contains the unit ball. 
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Lecture 11: Cryptography 


Cryptography is the theory of codes. Two important aspects of the field are the encryption rsp. decryption 
of information and error correction. Both are crucial in daily life. When getting access to a computer, viewing 
a bank statement or when taking money from the ATM, encryption algorithms are used. When phoning, surfing 
the web, accessing data on a computer or listening to music, error correction algorithms are used. Since our 
lives have become more and more digital: music, movies, books, journals, finance, transportation, medicine, 
and communication have become digital, we rely on strong error correction to avoid errors and encryption 
to assure things can not be tempered with. Without error correction, airplanes would crash: small errors 
in the memory of a computer would produce glitches in the navigation and control program. In a computer 
memory every hour a couple of bits are altered, for example by cosmic rays. Error correction assures that this 
gets fixed. Without error correction music would sound like a 1920 gramophone record. Without encryption, 
everybody could intrude electronic banks and transfer money. Medical history shared with your doctor would 
all be public. Before the digital age, error correction was assured by extremely redundant information storage. 
Writing a letter on a piece of paper displaces billions of billions of molecules in ink. Now, changing any single 
bit could give a letter a different meaning. Before the digital age, information was kept in well guarded safes 
which were physically difficult to penetrate. Now, information is locked up in computers which are connected 
to other computers. Vaults, money or voting ballots are secured by mathematical algorithms which assure 
that information can only be accessed by authorized users. Also life needs error correction: information in the 
genome is stored in a genetic code, where a error correction makes sure that life can survive. A cosmic ray 
hitting the skin changes the DNA of a cell, but in general this is harmless. Only a larger amount of radiation 
can render cells cancerous. 

How can an encryption algorithm be safe? One possibility is to invent a new method and keep it secret. An 
other is to use a well known encryption method and rely on the difficulty of mathematical computation 
tasks to assure that the method is safe. History has shown that the first method is unreliable. Systems which 
rely on "security through obfuscation" usually do not last. The reason is that it is tough to keep a method 
secret if the encryption tool is distributed. Reverse engineering of the method is often possible, for example 
using plain text attacks. Given a map T, a third party can compute pairs x, T'(a) and by choosing specific texts 
figure out what happens. 

The Caesar cypher permutes the letters of the alphabet. We can for example replace every letter A with 
B, every letter B with C and so on until finally Z is replaced with A. The word "Mathematics" becomes so 
encrypted as "Nbuifnbujdt". Caesar would shift the letters by 3. The right shift just discussed was used by 
his Nephew Augustus. Rot13 shifts by 13, and Atbash cypher reflects the alphabet, switch A with 7, B 
with Y etc. The last two examples are involutive: encryption is decryption. More general cyphers are obtained 
by permuting the alphabet. Because of 26! = 403291461126605635584000000 ~ 10?” permutations, it appears 
first that a brute force attack is not possible. But Cesar cyphers can be cracked very quickly using statistical 
analysis. If we know the frequency with which letters appear and match the frequency of a text we can figure 
out which letter was replaced with which. The Trithemius cypher prevents this simple analysis by changing 
the permutation in each step. It is called a polyalphabetic substitution cypher. Instead of a simple permutation, 
there are many permutations. After transcoding a letter, we also change the key. Lets take a simple example. 
Rotate for the first letter the alphabet by 1, for the second letter, the alphabet by 2, for the third letter, the 
alphabet by 3 etc. The word "Mathematics" becomes now "Newljshbrmd". Note that the second "a" has been 
translated to something different than a. A frequency analysis is now more difficult. The Viginaire cypher 
adds even more complexity: instead of shifting the alphabet by 1, we can take a key like "BCNZ", then shift the 
first letter by 1, the second letter by 3 the third letter by 13, the fourth letter by 25 the shift the 5th letter by 
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1 again. While this cypher remained unbroken for long, a more sophisticated frequency analysis which involves 
first finding the length of the key makes the cypher breakable. With the emergence of computers, even more 
sophisticated versions like the German enigma had no chance. 

Diffie-Hellman key exchange allows Ana and Bob want to agree on a secret key over a public channel. The 
two palindromic friends agree on a prime number p and a base a. This information can be exchanged over an 
open channel. Ana chooses now a secret number x and sends X = a* modulo p to Bob over the channel. Bob 
chooses a secret number y and sends Y = a¥ modulo p to Ana. Ana can compute Y* and Bob can compute 
X¥ but both are equal to a®¥. This number is their common secret. The key point is that eves dropper Eve, 
can not compute this number. The only information available to Eve are X and Y, as well as the base a and p. 
Eve knows that X = a” but can not determine x. The key difficulty in this code is the discrete log problem: 
getting x from a* modulo p is believed to be difficult for large p. 

The Rivest-Shamir-Adleman public key system uses a RSA public key (n, a) with an integer n = pg and 
a < (p—1)(q—1), where p,q are prime. Also here, n and a are public. Only the factorization of n is kept secret. 
Ana publishes this pair. Bob who wants to email Ana a message x, sends her y = «* mod n. Ana, who has 
computed b with ab = 1 mod (p—1)(q—1) can read the secrete email y because y? = 2% = 2®-YG-Y) = x modn. 
But Eve, has no chance because the only thing Eve knows is y and (n,a). It is believed that without the 
factorization of n, it is not possible to determine x. The message has been transmitted securely. The core 
difficulty is that taking roots in the ring Z, = {0,...,n —1 } is difficult without knowing the factorization 
of n. With a factorization, we can quickly take arbitrary roots. If we can take square roots, then we can also 
factor: assume we have a product n = pq and we know how to take square roots of 1. If x solves x? = 1 mod n 
and x is different from 1, then 2? — 1 = (2 — 1)(a+ 1) is zero modulo n. This means that p divides (a — 1) or 


(a +1). To find a factor, we can take the greatest common divisor of n,2 — 1. Take n = 77 for example. We 
are given the root 34 of 1. ( 342 = 1156 has reminder 1 when divided by 34). The greatest common divisor of 
34 — 1 and 77 is 11 is a factor of 77. Similarly, the greatest common divisor of 34+ 1 and 77 is 7 divides 77. 
Finding roots modulo a composite number and factoring the number is equally difficult. 


Cipher Used for Difficulty Attack 

Cesar transmitting messages many permutations Statistics 
Viginere transmitting messages many permutations Statistics 
Enigma transmitting messages no frequency analysis Plain text 
Diffie-Helleman agreeing on secret key discrete log mod p Unsafe primes 
RSA electronic commerce factoring integers Factoring 


The simplest error correcting code uses 3 copies of the same information so single error can be corrected. 
With 3 watches for example, one watch can fail. But this basic error correcting code is not efficient. It can 


correct single errors by tripling the size. Its efficiency is 33 percent. 
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Lecture 12: Dynamical systems 


Dynamical systems theory is the science of time evolution. If time is continuous the evolution is defined 
by a differential equation « = f(x). If time is discrete then we look at the iteration of a map 1 > T(2). 


The goal of the theory is to predict the future of the system when the present state is known. A differential 
equation is an equation of the form d/dta(t) = f(a#(t)), where the unknown quantity is a path a(t) in some 
“phase space". We know the velocity d/dta(t) = «(t) at all times and the initial configuration x(0)), we can to 
compute the trajectory x(t). What happens at a future time? Does x(t) stay in a bounded region or escape 
to infinity? Which areas of the phase space are visited and how often? Can we reach a certain part of the 
space when starting at a given point and if yes, when. An example of such a question is to predict, whether an 
asteroid located at a specific location will hit the earth or not. An other example is to predict the weather of 
the next week. 


An examples of a dynamical systems in one dimension is the differential equation 
a'(t) = x(t)(2 — 2(t)),2(0) =1 


It is called the logistic system and describes population growth. This system has the solution 2(t) = 
2e'/(1 + e?*) as you can see by computing the left and right hand side. 


A map is a rule which assigns to a quantity x(t) a new quantity z(t + 1) = T(a(t)). The state x(t) of the 
system determines the situation x(¢+ 1) at time t+1. An example is is the Ulam map T(x) = 4x(1— <2) on 
the interval [0,1]. This is an example, where we have no idea what happens after a few hundred iterates even 
if we would know the initial position with the accuracy of the Planck scale. 


Dynamical system theory has applications all fields of mathematics. It can be used to find roots of equations 
like for 


T(x) =a" — f(x)/f'(a) . 


A system of number theoretical nature is the Collatz map 
T(x) = 5 (even x), 3a + 1 else . 


A system of geometric nature is the Pedal map which assigns to a triangle the pedal triangle. 

About 100 years ago, Henry Poincaré was able to deal with chaos of low dimensional systems. While 
statistical mechanics had formalized the evolution of large systems with probabilistic methods already, the 
new insight was that simple systems like a three body problem or a billiard map can produce very com- 
plicated motion. It was Poincaré who saw that even for such low dimensional and completely deterministic 
systems, random motion can emerge. While physisists have dealt with chaos earlier by assuming it or artifi- 
cially feeding it into equations like the Boltzmann equation, the occurrence of stochastic motion in geodesic 
flows or billiards or restricted three body problems was a surprise. These findings needed half a century to 
sink in and only with the emergence of computers in the 1960ies, the awakening happened. Icons like Lorentz 
helped to popularize the findings and we owe them the "butterfly effect" picture: a wing of a butterfly can 
produce a tornado in Texas in a few weeks. The reason for this statement is that the complicated equations 
to simulate the weather reduce under extreme simplifications and truncations to a simple differential equation 


&=o(y—2),y =rxe —y-— «2,2 = ry — bz, the Lorenz system. For o = 10,r = 28,b = 8/3, Ed Lorenz 


discovered in 1963 an interesting long time behavior and an aperiodic "attractor". Ruelle-Takens called it a 


L72 


OLIVER KNILL 


strange attractor. It is a great moment in mathematics to realize that attractors of simple systems can 
become fractals on which the motion is chaotic. It suggests that such behavior is abundant. What is chaos? 
If a dynamical system shows sensitive dependence on initial conditions, we talk about chaos. We will 
experiment with the two maps T(r) = 4a(1— x) and S(x) = 4x — 4x? which starting with the same initial 
conditions will produce different outcomes after a couple of iterations. 

The sensitive dependence on initial conditions is measured by how fast the derivative dT” of the n’th iterate 
grows. The exponential growth rate y is called the Lyapunov exponent. A small error of the size h will be 
amplified to he’ after n iterates. In the case of the Logistic map with c = 4, the Lyapunov exponent is log(2) 
and an error of 10~!° is amplified to 2” - 107-1. For time n = 53 already the error is of the order 1. This 
explains the above experiment with the different maps. The maps T(x) and $(a) round differently on the level 
10~-!6. After 53 iterations, these initial fluctuation errors have grown to a macroscopic size. 

Here is a famous open problem which has resisted many attempts to solve it: Show that the map T (x,y) = 
(csin(27x) + 2a — y, x) with T" (x,y) = (fn(x,y), gn(@, y)) has sensitive dependence on initial conditions on a 
set of positive area. More precisely, verify that for c > 2 and all n Lf, ie log |Oz f(x, y)| dady > log($). The 
left hand side converges to the average of the Lyapunov exponents which is in this case also the entropy of the 
map. For some systems, one can compute the entropy. The logistic map with c = 4 for example, which is also 
called the Ulam map, has entropy log(2). The cat map 


T(a,y) = (2a + y,x2 + y) modl 


has positive entropy log |(\/5 + 3)/2|. This is the logarithm of the larger eigenvalue of the matrix implementing 
ae 

While questions about simple maps look artificial at first, the mechanisms prevail in other systems: in astron- 
omy, when studying planetary motion or electrons in the van Allen belt, in mechanics when studying coupled 
pendulum or nonlinear oscillators, in fluid dynamics when studying vortex motion or turbulence, in geometry, 
when studying the evolution of light on a surface, the change of weather or tsunamis in the ocean. Dynamical 
systems theory started historically with the problem to understand the motion of planets. Newton realized 
that this is governed by a differential equation, the n-body problem 


where c;; depends on the masses and the gravitational constant. If one body is the sun and no interaction of the 
planets is assumed and using the common center of gravity as the origin, this reduces to the Kepler problem 
x(t) = —Cx/|x|°, where planets move on ellipses, the radius vector sweeps equal area in each time and the 
period squared is proportional to the semi-major axes cubed. A great moment in astronomy was when Kepler 
derived these laws empirically. An other great moment in mathematics is Newton’s theoretically derivation 
from the differential equations. 
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Lecture 13: Computing 


Computing deals with algorithms and the art of programming. While the subject intersects with computer sci- 
ence, information technology, the theory is by nature very mathematical. But there are new aspects: computers 
have opened the field of experimental mathematics and serve now as the laboratory for new mathematics. 
Computers are not only able to simulate more and more of our physical world, they allow us to explore new 
worlds. 

A mathematician pioneering new grounds with computer experiments does similar work than an experimental 
physicist. Computers have smeared the boundaries between physics and mathematics. According to Borwein 
and Bailey, experimental mathematics consists of: 


Gain insight and intuition. Explore possible new results 

Find patterns and relations Suggest approaches for proofs 
Display mathematical principles Automate lengthy hand derivations 
Test and falsify conjectures Confirm already existing proofs 


When using computers to prove things, reading and verifying the computer program is part of the proof. If 
Goldbach’s conjecture would be known to be true for all n > 1018, the conjecture should be accepted because 
numerical verifications have been done until 2-10!® until today. The first famous theorem proven with the help 
of a computer was the "4 color theorem" in 1976. Here are some pointers in the history of computing: 


2700BC Sumerian Abacus 1935 Zuse 1 programmable 1973 Windowed OS 
200BC Chinese Abacus 1941 Zuse 3 1975 Altair 8800 
150BC Astrolabe 1943 Harvard Mark I 1976 Cray I 

125BC = Antikythera 1944 Colossus 1977 Apple II 

1300 Modern Abacus 1946 ENIAC 1981 Windows I 
1400 Yupana 1947 Transistor 1983 IBM PC 

1600 Slide rule 1948 Curta Gear Calculator 1984 Macintosh 
1623 Schickard computer 1952 IBM 701 1985 Atari 

1642 Pascal Calculator 1958 Integrated circuit 1988 Next 

1672 Leibniz multiplier 1969 Arpanet 1989 HTTP 

1801 Punch cards 1971 Microchip 1993 Web browser, PDA 
1822 Difference Engine 1972 Email 1998 Google 

1876 Mechanical integrator 1972 HP-35 calculator 2007 iPhone 


We live in a time where technology explodes exponentially. Moore’s law from 1965 predicted that semiconductor 
technology doubles in capacity and overall performance every 2 years. This has happened since. Futurologists 
like Ray Kurzweil conclude from this technological singularity in which artificial intelligence might take over. 
An important question is how to decide whether a computation is "easy" or "hard". In 1937, Alan Turing 
introduced the idea of a Turing machine, a theoretical model of a computer which allows to quantify com- 
plexity. It has finitely many states S = {s1,...,5,,h } and works on an tape of 0 — 1 sequences. The state h is 
the "halt" state. If it is reached, the machine stops. The machine has rules which tells what it does if it is in 
state s and reads a letter a. Depending on s and a, it writes 1 or 0 or moves the tape to the left or right and 
moves into a new state. Turing showed that anything we know to compute today can be computed with Turing 
machines. For any known machine, there is a polynomial p so that a computation done in k steps with that 
computer can be done in p(k) steps on a Turing machine. What can actually be computed? Church’s thesis 
of 1934 states that everything which can be computed can be computed with Turing machines. Similarly as in 
mathematics itself, there are limitations of computing. Turing’s setup allowed him to enumerate all possible 
Turing machine and use them as input of an other machine. Denote by TM the set of all pairs (T,), where T 
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is a Turing machine and z is a finite input. Let H C TM denote the set of Turing machines (T,x) which halt 
with the tape x as input. Turing looked at the decision problem: is there a machine which decides whether a 
given machine (T,) is in H or not. An ingenious Diagonal argument of Turing shows that the answer is "no". 
[Proof: assume there is a machine HALT which returns from the input (T,2) the output HALT(T,x) = true, 
if T halts with the input x and otherwise returns HALT(T,x) = false. Turing constructs a Turing machine 
DIAGONAL, which does the following: 1) Read x. 2) Define Stop=HALT(x,x) 3) While Stop=True repeat 
Stop:=True; 4) Stop. 
Now, DIAGONAL is either in H or not. If DIAGONAL is in H, then the variable Stop is true which means 
that the machine DIAGONAL runs for ever and DIAGONAL is not in H. But if DIAGONAL is not in H, then 
the variable Stop is false which means that the loop 3) is never entered and the machine stops. The machine is 
in H] 
Lets go back to the problem of distinguishing "easy" and "hard" problems: One calls P the class of decision 
problems that are solvable in polynomial time and NP the class of decision problems which can efficiently be 
tested if the solution is given. These categories do not depend on the computing model used. The question 
"N=NP?" is the most important open problem in theoretical computer science. It is one of the seven mille- 
nium problems and it is widely believed that P # NP. If a problem is such that every other NP problem 
can be reduced to it, it is called NP-complete. Popular games like Minesweeper or Tetris are NP-complete. If 
P# NP, then there is no efficient algorithm to beat the game. The intersection of NP-hard and NP is the class 
of NP-complete problems. An example of an NP-complete problem is the balanced number partitioning 
problem: given n positive integers, divide them into two subsets A, B, so that the sum in A and the sum in B 
are as close as possible. A first shot: chose the largest remaining number and distribute it to alternatively to 
the two sets. 
We all feel that it is harder to find a solution to a problem rather than to verify a solution. If N #4 NP 
there are one way functions, functions which are easy to compute but hard to verify. For some important prob- 
lems, we do not even know whether they are in NP. Examples are the the integer factoring problem. An 
efficient algorithm for the first one would have enormous consequences. Finally, lets look at some mathematical 
problems in artificial intelligence AI: 


problem solving playing games like chess, performing algorithms, solving puzzles 
pattern matching speech, music, image, face, handwriting, plagiarism detection, spam 


reconstruction tomography, city reconstruction, body scanning 

research computer assisted proofs, discovering theorems, verifying proofs 
data mining knowledge acquisition, knowledge organization, learning 

translation language translation, porting applications to programming languages 
creativity writing poems, jokes, novels, music pieces, painting, sculpture 
simulation physics engines, evolution of bots, game development, aircraft design 


inverse problems earth quake location, oil depository, tomography 
prediction weather prediction, climate change, warming, epidemics, supplies 
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ABOUT THIS DOCUMENT 


It should have become obvious that I’m reporting on many of these theorems as a tourist and 
not as a local. In some few areas I could qualify as a tour guide but hardly as a local. The 
references contain only parts which have been consulted but it does not imply that I know all of 
that source. My own background was in dynamical systems theory and mathematical physics. 
Both of these subjects by nature have many connections with other branches of mathematics. 


The motivation to try such a project came through teaching a course called Math E 320 at the 
Harvard extension school. This math-multi-disciplinary course is part of the “math for teaching 
program", and tries to map out the major parts of mathematics and visit some selected placed 
on 12 continents. 


It is wonderful to visit other places and see connections. One can learn new things, relearn 
old ones and marvel again about how large and diverse mathematics is but still to notice how 
many similarities there are between seemingly remote areas. A goal of this project is also to 
get back up to speed up to the level of a first year grad student (one forgets a lot of things over 
the years) and maybe pass the qualifying exams (with some luck). 


This summer 2018 project also illustrates the challenges when trying to tour the most important 
mountain peaks in the mathematical landscape with limited time. Already the identification 
of major peaks and attaching a “height" can be challenging. Which theorems are the most 
important? Which are the most fundamental? Which theorems provide fertile seeds for new 
theorems? I recently got asked by some students what I consider the most important theorem 
in mathematics (my answer had been the “Atiyah-Singer theorem"). 


Theorems are the entities which build up mathematics. Mathematical ideas show their merit 
only through theorems. Theorems not only help to bring ideas to live, they in turn allow to 
solve problems and justify the language or theory. But not only the results alone, also the 
history and the connections with the mathematicians who created the results are fascinating. 


The first version of this document got started in May 2018 and was posted in July 2018. Com- 
ments, suggestions or corrections are welcome. I hope to be able to extend, update and clarify 
it and explore also still neglected continents in the future if time permits. 


It should be pretty obvious that one can hardly do justice to all mathematical fields and that 
much more would be needed to cover the essentials. A more serious project would be to 
identify a dozen theorems in each of the major MSC classification fields. The current MSC2020 
classification system has now 64 major entries and thousands of sub-entries listed on 120 pages 
521}. But even “thousand and one theorem" list would only be the tip of the iceberg. Such 
a list exists already: on Wikipedia, there are currently about 1000 theorems discussed. The 
one-document project getting closest to this project is maybe the beautiful book [515]. 
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251. DOCUMENT HISTORY 


The first draft was posted on July 22, 2018 [400]. On July 23, 2018, a short list of theorems 
was made available on [401]. This document history section got started on July 25-27, 2018. 


July 28 2018: Entry 36 had been a repeated prime number theorem entry. Its alternative is now the Fredholm alternative. 
Also added are the Sturm theorem and Smith normal form. 

July 29: The two entries about Lidskii theorem and Radon transform are added. 

July 30: An entry about linear programming. 

July 31: An entry about random matrices. 

August 2: An entry about entropy of diffeomorphisms 

August 4: 104-108 entries: linearization, law of small numbers, Ramsey, Fractals and Poincare duality. 

August 5: 109-111 entries: Rokhlin and Lax approximation, Sobolev embedding 

August 6: 112: Whitney embedding. 

August 8: 113-114: AI and Stokes entries 


August 
August 
August 
August 
August 
August 
August 
August 


12: 
13: 
14: 
15: 
16: 
ik: 
18: 
19: 


August 21 
August 22 


August 


24 


August 25 
August 27 


August 


28 


August 31 
September 1: 153-155 

September 2: 156 

September 8: 157,158 

September 14 2018: 159-161 

September 25 2018: 162-164 

March 17 2019: 165-169 

March 20, 2019: section on paradigms 

March 21, 2019: 170 

March 27, 2019, 171 

June 20, 2019, 172 

August 6, 2020, 173-174, deepness section started 
August 8, 2020, 175-177, more on deepness section 
August 18, 2020, 178,179, 

August 19, 2020, 180,181,182 

August 20, 2020, section on essential math, 183-185 
August 24, 2020, 186,187 

August 25, 2020, 188,189,190,191 

August 26, 2020, 192, 193 

August 27, 2020, 194 - 200 

August 28, 2020, 201, 202 

August 30, 2020, 203 

August 31, 2020, 204,205 

September 5, 2020, 206,207 

September 6-8, 2020, 208-212 

September 9, 2020, 213-214 

September 10, 2020, 215,216,217,218 

September 21, 2020, 219 

October 2, 2020, 220-221 

October 8, 2020, 222-223 

October 12, 2020, 224-225 

November 4, 2020, 226-227 

November 5, 2020, 228-231 

November 6, 2020, 232 

November 16, 2020, 233-234 

November 25, 2020, 235-236 


115 and 116: Moment entry and martingale theorem 
117 and 118: theorema egregium and Shannon theorem 
119 mountain pass 

120, 121,122,123 exponential sums, sphere theorem, word problem and finite simple groups 
124, 125, 126, Rubik, Sard and Elliptic curves, 

127, 128, 129 billiards, uniformization, Kalman filter 
130,131 Zarisky and Poincare’s last theorem 

132, 133 Geometrization, Steinitz 

: 134, 135 Hilbert-Einstein, Hall marriage 

: 136-130 

: 141-142 

: 143-144 

: 145-149 

: 150-151 

: 152 
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December 3, 2020, 237-238 
December 4, 2020, 239 
January 20, 2021, 240-243 
May 11, 2021, 244 
February 2, 2022, 245-250 


252. ‘TOP CHOICE 


The short list of 10 theorems mentioned in the youtube clip were: 


Fundamental theorem of arithmetic (prime factorization) 
Fundamental theorem of geometry (Pythagoras theorem) 
Fundamental theorem of logic (incompleteness theorem) 
Fundamental theorem of topology (rule of product) 
Fundamental theorem of computability (Turing computability) 
Fundamental theorem of calculus (Stokes theorem) 
Fundamental theorem of combinatorics, (pigeonhole principle) 
Fundamental theorem of analysis (spectral theorem) 
Fundamental theorem of algebra (polynomial factorization) 
Fundamental theorem of probability (central limit theorem) 


Let me try to justify this shortlist. It should go without saying that similar arguments could be 
stated for any other choice, except maybe for the five classical fundamental theorems: Arith- 
metic, Geometry (which is undisputed Pythagoras), Calculus and Algebra, where one can hardly 
argue much: except for the Pythagorean theorm, their given name already suggests that they 
are considered fundamental. Here is some reflection: 


Analysis. Why chose the spectral theorem and not say the more general Jordan 
normal form theorem? This is not an easy call but the Jordan normal form 
theorem is less simple to state and furthermore, that it does not stress the importance of 
normality giving the possibility for a functional calculus. Also, the spectral theorem 
holds in infinite dimensions for operators on Hilbert spaces. If one looks at mathematical 
physics for example, then it is the functional calculus of operators which is really 
made use of; the Jordan normal form theorem appears rarely in comparison. In infinite 
dimensions, a Jordan normal form theorem would be much more difficult as the operator 
Au(n) = u(n+1) on /?(Z) is both unitary as well as a “Jordan form matrix". The spectral 
theorem however sails through smoothly to infinite dimensions and even applies with 
adaptations to unbounded self-adjoint operators which are important in physics. 
And as it is a core part of analysis, it is also fine to see the theorem as part of 
analysis. The main reason of course is that the fundamental theorem of algebra is 
already occupied by a theorem. One could object that “analysis" is already represented 
by the fundamental theorem of calculus but calculus is so important that it can represent 
its own field. The idea of the fundamental theorem of calculus goes beyond calculus. It 
is essentially a cancellation property, a telescopic sum or Pauli principle (d? = 0 
for exterior derivatives) which makes the principle work. Calculus is the idea of an 
exterior derivative, the idea of cohomology, a link between algebra and geometry. One 
can see calculus also as a theory of “time". In some sense, the fundamental theorem of 
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calculus also represents the field of differential equations and this is what “time is all 
about". 

Probability. One can ask also why to pick the central limit theorem and not say 
the Bayes formula or then the deeper law of iterated logarithm. One objection 
against the Bayes formula is that it is essentially a definition, like the basic arithmetic 
properties “commutativity, distributivity or associativity" in an algebraic structure like 
a ring. One does not present the identity a+ b = b+ a for example as a fundamental 
theorem. Yes, the Bayes theorem has an unusual high appeal to scientists as it appears 
like a magic bullet, but for a mathematician, the statement just does not have enough 
beef: it is a definition, not a theorem. Not to belittle the Bayes theorem, like the notion 
of entropy or the notion of logarithm, it is a genius concept. But it is not an 
actual theorem, as the cleverness of the statement of Bayes lies in the definition and 
so the clarification of conditional probability theory. For the central limit theorem, it is 
pretty clear that it should be high up on any list of theorems, as the name suggests: it is 
central. But also, it actually is stronger than some versions of the law of large numbers. 
The strong law is also super seeded by Birkhoff’s ergodic theorem which is much more 
general. One could argue to pick the law of iterated logarithm or some Martingale 
theorem instead but there is something appealing in the central limit theorem which 
goes over to other set-ups. One can formulate the central limit theorem also for random 
variables taking values in a compact topological group like when doing statistics with 
spherical data [518]. An other pitch for the central limit theorem is that it is a fixed 
point of a renormalization map X — X + X (where the right hand side is the sum 
of two independent copies of X) in the space of random variables. This map increases 
entropy and the fixed point is is arandom variable whose distribution function f has the 
maximal entropy — J, f(x) log(f(x)) dz among all probability density functions. The 
entropy principle justifies essentially all known probability density functions. Nature 
just likes to maximize entropy and minimize energy or more generally - in the presence 
of energy - to minimize the free energy. 

Topology. Topology is about geometric properties which do not change under contin- 
uous deformation or more generally under homotopies. Quantities which are invariant 
under homeomorphisms are interesting. Such quantities should add up under disjoint 
unions of geometries and multiply under products. The Euler characteristic is the proto- 
type. Taking products is fundamental for building up Euclidean spaces (also over other 
fields, not only the real numbers) which locally patch up more complicated spaces. It is 
the essence of vector spaces that after building a basis, one has a product of Euclidean 
spaces. Field extensions can be seen therefore as product spaces. How does the counting 
principle come in? As stated, it actually is quite strong and calling it a “fundamental 
principle of topology" can be justified if the product of topological spaces is defined 
properly: if 1 is the one-point space, one can see the statement G x 1 = G, as the 
Barycentric refinement of G, implying that the Euler characteristic is a Barycentric 
invariant and so that it is a “counting tool" which can be pushed to the continuum, to 
manifolds or varieties. And the compatibility with the product is the key to make it 
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work. Counting in the form of Euler characteristic goes throughout mathematics, com- 
binatorics, differential geometry or algebraic geometry. Riemann-Roch or Atiyah-Singer 
and even dynamical versions like the Lefschetz fixed point theorem (which generalizes 
the Brouwer fixed point theorem) or the even more general Atiyah-Bott theorem can be 
seen as extending the basic counting principle: the Lefschetz number \(X,T) 
is a dynamical Euler characteristic which in the static case T’ = Id reduces to the Euler 
characteristic y(X). In “school mathematics", one calls the principle the “fundamental 
principle of counting" or “rule of product". It is put in the following way: “If we have 
k; ways to do one thing and m ways to do an other thing, then we have k * m ways to 
do both". It is so simple that one can argue that it is over represented in teaching but 
it is indeed important. |61] makes the point that it should be considered a founding 
stone of combinatorics. 

Why is the multiplicative property more fundamental than the additive counting 
principle. It is again that the additive property is essentially placed in as a definition of 
what a valuation is. It is in the in-out-formula y(AU B)+ (ANB) = y(A)+x(B). 
Now, this inclusion-exclusion formula is also important in combinatorics but it is already 
in the definition of what we call counting or “adding things up". The multiplicative 
property on the other hand is not a definition; it actually is quite non-trivial. It charac- 
terizes classical mathematics as quantum mechanics or non-commutative flavors 
of mathematics have shown that one can extend things. So, if the “rule of product" 
(which is taught in elementary school) is beefed up to be more geometric and interpreted 
to Euler characteristic, it becomes fundamental. 

Combinatorics. The pigeonhole principle stresses the importance of order structure, 
partially ordered sets (posets) and cardinality or comparisons of cardinality. The point 
for posets is made in [551] who writes The biggest lesson I learned from Richard Stanley’s 
work is, combinatorial objects want to be partially ordered! The use of injective functions 
to express cardinality is a key part of Cantor. Like some of the ideas of Grothendieck it 
is of “infantile simplicity" (quote Grothendieck about schemes) but powerful. It allowed 
for the stunning result that there are different infinities. One of the reason for the success 
of Cantor’s set theory is the immediate applicability. For any new theory, one has to 
ask: “does it tell me something I did not know?" In “set theory" the larger cardinality 
of the reals (uncountable) than the cardinality of the algebraic numbers (countable) 
gave immediately the existence of transcendental numbers. This is very elegant. 
The pigeonhole principle similarly gives combinatorial results which are non trivial and 
elegant. Currently, searching for “the fundamental theorem of combinatorics" gives the 
“rule of product". As explained above, we gave it a geometric spin and placed it into 
topology. Now, combinatorics and topology have always been very hard to distinguish. 
Euler, who somehow booted up topology by reducing the K6nigsberg problem to a 
problem in graph theory did that already. Combinatorial topology is essentially part of 
topology. Today, some very geometric topics like algebraic geometry have been placed 
within pure commutative algebra (this is how I myself was exposed to algebraic 
geometry) On the other hand, some very hard core combinatorial problems like the 
upper bound conjecture have been proven with algebro-geometric methods like toric 
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varieties which are geometric. In any case, order structures are important everywhere 
and the pigeonhole principle justifies the importance of order structures. 

e Computation. There is no official “fundamental theorem of computer science" but 
the Turing completeness theorem comes up as a top candidate when searching on 
engines. Turing formalized using Turing machines in a precise way, what computing 
is, and even what a proof is. It nails down mathematical activity of running an 
algorithm or argument in a mathematical way. It is also pure as it is not hardware 
dependent. One can also only appreciate Turing’s definition if one sees how different 
programming languages can look like and also in logic, what type of different frame 
works have been invented. Turing breaks all this complexity with a machine which 
can be itself part of mathematics leading to the Halte problem illustrating the basic 
limitations of computation. Quantum computing would add a hardware component 
and might break through the Turing-Church thesis that everything we can compute 
can be computed with Turing machines in the same complexity class. Gddel and Turing 
are related and the Turing incompleteness theorem has a similar flavor than the Gédel 
incompleteness theorems. There is an other angle to it and that is the question of 
complexity. I would predict that most mathematicians would currently favor the 
Platonic view of the Church thesis and predict that also new paradigms like quantum 
computing will never go beyond Turing computability or even not break through 
complexity barriers like P-NP thresholds. It is just that the Turing completeness 
theorem is too beautiful to be spoiled by a different type of complexity tied to a physical 
world. The point of view is that anything we see in the physical world can in principle 
be computed with a machine without changing the complexity class. But that 
picture could be as naive as Hilbert’s dream one hundred years ago. Still, whatever 
happens in the future, the Turing completeness theorem remains a theorem. Theorems 
stay true. 

e Logic. One can certainly argue whether it would be justified to have Géddel’s theorem 
replaced by a theorem in category theory like the Yoneda lemma. The Yoneda result is 
not easy to state and it does not produce yet an “Aha moment" like Gédel’s theorem 
does (the liars paradox explains the core of Gédel’s theorem, and it was successfully 
popularized in [319].) Maybe The Yoneda theorem will hit the pop culture in the future, 
when all mathematics has been naturally and pedagogically well expressed in categorical 
language. I’m personally not sure whether this will ever happen: not everything which 
is nice also had been penetrating large parts of mathematics: an example is given by 
non-standard analysis, which makes calculus orders of magnitudes easier and which is 
related also to surreal numbers, which are the most “natural” numbers. Both concepts 
have not entered calculus or algebra textbooks and there are reasons: the subjects need 
mathematical maturity and one can easily make mistakes. (I myself use non-standard 
analysis on an intuitive level as presented by Nelson and think of a compact 
set as a finite set for example which for example, where basic theorems almost require no 
proof like the Bolzano theorem telling that a continuous function on a compact set takes 
a maximum). But using non-standard analysis would be a “no-no" both in teaching as 
well when formulating mathematical thoughts for others who are not familar with the 
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three additional axioms IST within ZFC of Nelson. It is non-standard and true to its 
name. An example where something was once pop-culture but then was sidelined are 
quaternions. It might be a topic which has a comeback. Fashion is hard to predict. Also, 
much of category theory still feels just like a huge conglomerate of definitions. There 
is lots of dough in the form of definitions and little raisins in the form of theorems. 
Historically also the language of set theory have been overkill especially in education, 
where it has lead to “new math" controversies in the 1960ies. The work of Russel and 
Whitehead demonstrates, how clumsy things can become if boiled down to the small 
pieces. We humans like to think and programming in higher order structures, rather 
than doing assembly coding, we like to work in object oriented languages which give 
more insight. But we like and make use of that higher order codes can be boiled down to 
assembly closer to what the basic instructions are. This is similar in mathematics and 
also in future, a topologist working in 4 manifold theory will hardly think about all the 
definitions in terms of sets for similar reasons than a modern computer algebra system 
does not break down all the objects into lists and lists of lists (even so, that’s what it 
often is). Category theory has a chance to change the landscape because it is close to 
computer science and to natural data structures. It is more pictorial and flexible than 
set theory alone. It definitely has been very successful to find new structures and see 
connections within different fields like computer science [543]. It also has lead to more 
flexible axiom systems. 
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Cohomology, 47 
cohomology, 25 
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Combinatorial convexity, 81 
combinatorics, 164 
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